FEATURES 


32 bit architecture 

33 ns internal cycle time 

30 MIPS (peak) instruction rate 

4.3 Mflops (peak) instruction rate 

64 bit on-chip floating point unit which conforms to 
IEEE 754 

4 Kbytes on-chip static RAM 

120 Mbytes/sec sustained data rate to internal memory 
4 Gbytes directly addressable external memory 

40 Mbytes/sec sustained data rate to external memory 
630 ns response to interrupts 

Four INMOS serial links 5/10/20 Mbits/sec 
Bi-directional data rate of 2.4 Mbytes/sec per link 

High performance graphics support with block move 
instructions 

Boot from ROM or communication links 

Single 5 MHz clock input 

Single +5V +5% power supply 

MIL-STD-883C processing will be available 


APPLICATIONS 


Scientific and mathematical applications 
High speed multi processor systems 
High performance graphics processing 
Supercomputers 

Workstations and workstation clusters 
Digital signal processing 

Accelerator processors 

Distributed databases 

System simulation 
Telecommunications 

Robotics 

Fault tolerant systems 

Image processing 

Pattern recognition 

Artificial intelligence 
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1 Introduction 


The IMS T800 transputer is a 32 bit CMOS microcomputer with a 64 bit floating point unit and graphics support. 
It has 4 Kbytes on-chip RAM for high speed processing, a configurable memory interface and four standard 
INMOS communication links. The instruction set achieves efficient implementation of high level languages 
and provides direct support for the OCCaM model of concurrency when using either a single transputer or a 
network. Procedure calls, process switching and typical interrupt latency are sub-microsecond. 


For convenience of description, the IMS T800 operation is split into the basic blocks shown in figure 1.1. 


Floating Point Unit 
vcc 
GND 


CapPlus 
CapMinus 32 bit 


Reset System Processor 
Analyse services 
Errorin 


Error 
BootFromROM 


Clockin Link LinkSpecial 
ProcSpeedSelect0-2 Teme 


Timers Link LinkinO 
4k bytes oa Linkin1 
DiaableintRam of | 32 | Interface LinkOut1 


On-chip 
RAM Link Linkin2 
es LinkOut2 
ProcClockOut es Link Linkin3 
notMemS0-4 Interface LinkOut3 
notMemWrB0-3 EventReq 
notMemRd Verne 
notMemRf Mana reine 
— Interface 
emWait 
: MemnotWrDO 
eons GEE ee 
emnheq MemAD2-31 


MemGranted 


Figure 1.1 IMS T800 block diagram 


The processor speed of a device can be pin-selected in stages from 17.5 MHz up to the maximum allowed 
for the part. A device running at 30 MHz achieves an instruction throughput of 30 MIPS peak and 15 MIPS 
sustained. The extended temperature version of the device complies with MIL-STD-883C. 


The IMS T800 provides high performance arithmetic and floating point operations. The 64 bit floating point unit 
provides single and double length operation to the ANSI-IEEE 754-1985 standard for floating point arithmetic. 
It is able to perform floating point operations concurrently with the processor, sustaining a rate of 1.5 Mflops 
at a processor speed of 20 MHz and 2.25 Mflops at 30 MHz. 


High performance graphics support is provided by microcoded block move instructions which operate at the 
speed of memory. The two-dimensional block move instructions provide for contiguous block moves as well 
as block copying of either non-zero bytes of data only or zero bytes only. Block move instructions can be used 
to provide graphics operations such as text manipulation, windowing, panning, scrolling and screen updating. 


Cyclic redundancy checking (CRC) instructions are available for use on arbitrary length serial data streams, 
to provide error detection where data integrity is critical. Another feature of the IMS T800, useful for pattern 
recognition, is the facility to count bits set in a word. 


The IMS T800 can directly access a linear address space of 4 Gbytes. The 32 bit wide memory interface 
uses multiplexed data and address lines and provides a data rate of up to 4 bytes every 100 nanoseconds 
(40 Mbytes/sec) for a 30 MHz device. A configurable memory controller provides all timing, control and DRAM 
refresh signals for a wide variety of mixed memory systems. 


System Services include processor reset and bootstrap control, together with facilities for error analysis. Error 
signals may be daisy-chained in multi-transputer systems. 


The standard INUOS communication links allow networks of transputer family products to be constructed by 
direct point to point connections with no external logic. The IMS T800 links support the standard operating 
speed of 10 Mbits/sec, but also operate at 5 or 20 Mbits/sec. Each link can transfer data bi-directionally at 
up to 2.35 Mbytes/sec. 


The transputer is designed to implement the OCCamM language, detailed in the OCCam Reference Manual, but 
also efficiently supports other languages such as C, Pascal and Fortran. Access to the transputer at machine 
level is seldom required, but if necessary refer to The Transputer Instruction Set - A Compiler Writers’ Guide. 


This data sheet supplies hardware implementation and characterisation details for the IMS T800. It is intended 
to be read in conjunction with the Transputer Architecture section of the Transputer Databook, which details 
the architecture of the transputer and gives an overview of OCCam. 
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Pin designations 


Table 2.1 IMS T800 system services 


pO Pin S| st | Function 


vCC, GND 


CapPlus, CapMinus 


Clockin Input clock 

ProcSpeedSelect0-2 Processor speed selectors 

Reset System reset 

Error Error indicator 

Errorin Error daisychain input 

Analyse Error analysis 

BootFromRom Boot from external ROM or from link 
DisableIntRAM Disable internal RAM 


DoNotWire 


ProcClockOut 
MemnotWrD0 
MemnotRfD1 
MemAD2-31 
notMemRd 
notMemWrB0-3 
notMemS0-4 
notMemRf 
MemWait 


MemReq 
MemGranted 
MemConfig 


EventReq 
EventAck 


LinkinO-3 


LinkOut0-3 Four serial data output channels 
LinkSpecial Select non-standard speed as 5 or 20 Mbits/sec 
LinkOSpecial Select special speed for Link 0 


Link123Special 


Power supply and return 
External capacitor for internal clock power supply 


Must not be wired 


Table 2.2 IMS T800 external memory interface 


Processor clock 

Multiplexed data bit 0 and write cycle warning 
Multiplexed data bit 1 and refresh warning 
Multiplexed data and address bus 

Read strobe 

Four byte-addressing write strobes 

Five general purpose strobes 

Dynamic memory refresh indicator 
Memory cycle extender 

Direct memory access request 

Direct memory access granted 

Memory configuration data input 


Event request 
Event request acknowledge 


Table 2.4 IMS T800 link 


Po Pin Sut | Funection 


Four serial data input channels 


Select special speed for Links 1,2,3 


Signal names are prefixed by not if they are active low, otherwise they are active high. 
Pinout details for various packages are given on page 68. 


3 Processor 


The 32 bit processor contains instruction processing logic, instruction and work pointers, and an operand 
register. It directly accesses the high speed 4 Kbyte on-chip memory, which can store data or program. 
Where larger amounts of memory or programs in ROM are required, the processor has access to 4 Gbytes 
of memory via the External Memory Interface (EMI). 


3.1 Registers 


The design of the transputer processor exploits the availability of fast on-chip memory by having only a small 
number of registers; six registers are used in the execution of a sequential process. The small number of 
registers, together with the simplicity of the instruction set, enables the processor to have relatively simple 
(and fast) data-paths and control logic. The six registers are: 


The workspace pointer which points to an area of store where local variables are kept. 
The instruction pointer which points to the next instruction to be executed. 

The operand register which is used in the formation of instruction operands. 

The A, Band C registers which form an evaluation stack. 


A, B and C are sources and destinations for most arithmetic and logical operations. Loading a value into the 
Stack pushes B into C, and A into B, before loading A. Storing a value from A, pops B into A and C into B. 


Expressions are evaluated on the evaluation stack, and instructions refer to the stack implicitly. For example, 
the add instruction adds the top two values in the stack and places the result on the top of the stack. The use of 
a stack removes the need for instructions to respecify the location of their operands. Statistics gathered froma 
large number of programs show that three registers provide an effective balance between code compactness 
and implementation complexity. 


No hardware mechanism is provided to detect that more than three values have been loaded onto the stack. 
It is easy for the compiler to ensure that this never happens. 


Any location in memory can be accessed relative to the workpointer register, enabling the workspace to be 
of any size. 


Further register details are given in The Transputer Instruction Set - A Compiler Writers’ Guide. 


Registers 


Figure 3.1 Registers 
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3.2 Instructions 


The instruction set has been designed for simple and efficient compilation of high-level languages. All in- 
structions have the same format, designed to give a compact representation of the operations occurring most 
frequently in programs. 


Each instruction consists of a single byte divided into two 4-bit parts. The four most significant bits of the byte 
are a function code and the four least significant bits are a data value. 


| Function | Data 


Operand Register Le 


Figure 3.2 Instruction format 


3.2.1 Direct functions 


The representation provides for sixteen functions, each with a data value ranging from 0 to 15. Ten of these, 
shown in table 3.1, are used to encode the most important functions. 


Table 3.1 Direct functions 


load constant add constant 
load local store local load local pointer 


load non-local store non-local 
jump conditional jump Call 


The most common operations in a program are the loading of small literal values and the loading and storing 
of one of a small number of variables. The /oad constant instruction enables values between 0 and 15 to be 
loaded with a single byte instruction. The load local and store local instructions access locations in memory 
relative to the workspace pointer. The first 16 locations can be accessed using a single byte instruction. 


The load non-local and store non-local instructions behave similarly, except that they access locations in 
memory relative to the A register. Compact sequences of these instructions allow efficient access to data 
structures, and provide for simple implementations of the static links or displays used in the implementation 
of high level programming languages such as OCCam, C, Fortran, Pascal or ADA. 


3.2.2 Prefix functions 


Two more function codes allow the operand of any instruction to be extended in length; prefix and negative 
prefix. 


All instructions are executed by loading the four data bits into the least significant four bits of the operand 
register, which is then used as the instruction’s operand. All instructions except the prefix instructions end by 
clearing the operand register, ready for the next instruction. 


The prefix instruction loads its four data bits into the operand register and then shifts the operand register up 
four places. The negative prefix instruction is similar, except that it complements the operand register before 
shifting it up. Consequently operands can be extended to any length up to the length of the operand register 
by a sequence of prefix instructions. In particular, operands in the range -256 to 255 can be represented 
using one prefix instruction. 


The use of prefix instructions has certain beneficial consequences. Firstly, they are decoded and executed 
in the same way as every other instruction, which simplifies and speeds instruction decoding. Secondly, they 
simplify language compilation by providing a completely uniform way of allowing any instruction to take an 
operand of any size. Thirdly, they allow operands to be represented in a form independent of the processor 
wordiength. 


3.2.3 Indirect functions 


The remaining function code, operate, causes its operand to be interpreted as an operation on the values 
held in the evaluation stack. This allows up to 16 such operations to be encoded in a single byte instruction. 
However, the prefix instructions can be used to extend the operand of an operate instruction just like any 
other. The instruction representation therefore provides for an indefinite number of operations. 


Encoding of the indirect functions is chosen so that the most frequently occurring operations are represented 
without the use of a prefix instruction. These include arithmetic, logical and comparison operations such as 
add, exclusive or and greater than. Less frequently occurring operations have encodings which require a 
single prefix operation. 


3.2.4 Expression evaluation 


Evaluation of expressions sometimes requires use of temporary variables in the workspace, but the number 
of these can be minimised by careful choice of the evaluation order.. 


Table 3.2 Expression evaluation 


Program Mnemonic 


3.2.5 Efficiency of encoding 


Measurements show that about 70% of executed instructions are encoded in a single byte; that is, without 
the use of prefix instructions. Many of these instructions, such as load constant and add require just one 
processor cycle. 


The instruction representation gives a more compact representation of high level language programs than 
more conventional instruction sets. Since a program requires less store to represent it, less of the memory 
bandwidth is taken up with fetching instructions. Furthermore, as memory is word accessed the processor 
will receive four instructions for every fetch. . 


Short instructions also improve the effectiveness of instruction pre-fetch, which in turn improves processor 
performance. There is an extra word of pre-fetch buffer, so the processor rarely has to wait for an instruction 
fetch before proceeding. Since the buffer is short, there is little time penalty when a jump instruction causes 
the buffer contents to be discarded. 
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3.3 Processes and concurrency 


A process starts, performs a number of actions, and then either stops without completing or terminates 
complete. Typically, a process is a sequence of instructions. A transputer can run several processes in 
parallel (concurrently). Processes may be assigned either high or low priority, and there may be any number 
of each (page 10). 


The processor has a microcoded scheduler which enables any number of concurrent processes to be exe- 
cuted together, sharing the processor time. This removes the need for a software kernel. 


At any time, a concurrent process may be 


Active - Being executed. 
- On alist waiting to be executed. 


inactive - Ready to input. 
- Ready to output. 
- Waiting until a specified time. 


The scheduler operates in such a way that inactive processes do not consume any processor time. It allocates 
a portion of the processor’s time to each process in turn. Active processes waiting to be executed are held 
in two linked lists of process workspaces, one of high priority processes and one of low priority processes 
(page 10). Each list is implemented using two registers, one of which points to the first process in the list, 
the other to the last. In the Linked Process List figure 3.3, process S is executing and P, Q and FR are active, 
awaiting execution. Only the low priority process queue registers are shown; the high priority process ones 
perform in a similar manner. 


Registers Locals 


FPtr1 (Front) 
BPtr1 (Back) 


Figure 3.3 Linked process list 


Table 3.3 Priority queue control registers 


| Function | High Priority | Low Priority 
Pointer to front of active process list Fptro Fptr1 
Pointer to back of active process list Bptro Bptrt 


Each process runs until it has completed its action, but is descheduled whilst waiting for communication from 
another process or transputer, or for a time delay to complete. In order for several processes to operate in 
parallel, a low priority process is only permitted to run for a maximum of two time slices before it is forcibly 
descheduled at the next descheduling point (page 13). The time slice period is 5120 cycles of the external 
5 MHz clock, giving ticks approximately 1 ms apart. 
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A process can only be descheduled on certain instructions, known as descheduling points (page 13). As a 
result, an expression evaluation can be guaranteed to execute without the process being timesliced part way 
through. 


Whenever a process is unable to proceed, its instruction pointer is saved in the process workspace and 
the next process taken from the list. Process scheduling pointers are updated by instructions which cause 
scheduling operations, and should not be altered directly. Actual process switch times are less than 1 pS, as 
little state needs to be saved and it is not necessary to save the evaluation stack on rescheduling. 


The processor provides a number of special operations to support the process model, including start process 
and end process. When a main process executes a parallel construct, start process instructions are used 
to create the necessary additional concurrent processes. A start process instruction creates a new process 
by adding a new workspace to the end of the scheduling list, enabling the new concurrent process to be 
executed together with the ones already being executed. When a process is made active it is always added 
to the end of the list, and thus cannot pre-empt processes already on the same list. 


The correct termination of a parallel construct is assured by use of the end process instruction. This uses 
a workspace location as a counter of the parallel construct components which have still to terminate. The 
counter is initialised to the number of components before the processes are started. Each component ends 
with an end process instruction which decrements and tests the counter. For all but the last component, the 
counter is non zero and the component is descheduled. For the last component, the counter is zero and the 
main process continues. 


3.4 Priority 


The IMS T800 supports two levels of priority. Priority 1 (low priority) processes are executed whenever there 
are no active priority 0 (high priority) processes. 


High priority processes are expected to execute for a short time. If one or more high priority processes are 
able to proceed, then one is selected and runs until it has to wait for a communication, a timer input, or until 
it completes processing. 


If no process at high priority is able to proceed, but one or more processes at low priority are able to proceed, 
then one is selected. 


Low priority processes are periodically timesliced to provide an even distribution of processor time between 
computationally intensive tasks. 


If there are n low priority processes, then the maximum latency from the time at which a low priority process 
becomes active to the time when it starts processing is 2n-2 timeslice periods. It is then able to execute for 
between one and two timeslice periods, less any time taken by high priority processes. This assumes that 
no process monopolises the transputer’s time; i.e. it has a distribution of descheduling points (page 13). 


Each timeslice period lasts for 5120 cycles of the external 5 MHz input clock (approximately 1 ms at the 
Standard frequency of 5 MHz). 


If a high priority process is waiting for an external channel to become ready, and if no other high priority 
process is active, then the interrupt latency (from when the channel becomes ready to when the process 
Starts executing) is typically 19 processor cycles, a maximum of 78 cycles (assuming use of on-chip RAM). 
If the floating point unit is not being used at the time then the maximum interrupt latency is only 58 cycles. 
To ensure this latency, certain instructions are interruptable. 


3.5 Communications 
Communication between processes is achieved by means of channels. Process communication is point-to- 


point, synchronised and unbuffered. As a result, a channel needs no process queue, no message queue and 
no message buffer. 
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A channel between two processes executing on the same transputer is implemented by a single word in 
memory; a channel between processes executing on different transputers is implemented by point-to-point 
links. The processor provides a number of operations to support message passing, the most important being 
input message and output message. 


The input message and output message instructions use the address of the channel to determine whether 
the channel is internal or external. Thus the same instruction sequence can be used for both, allowing a 
process to be written and compiled without knowledge of where its channels are connected. 


The process which first becomes ready must wait until the second one is also ready. A process performs an 
input or Output by loading the evaluation stack with a pointer to a message, the address of a channel, and 
a count of the number of bytes to be transferred, and then executing an input message or output message 
instruction. Data is transferred if the other process is ready. If the channel is not ready or is an external one 
the process will deschedule. 


3.6 Timers 


The transputer has two 32 bit timer clocks which ‘tick’ periodically. The timers provide accurate process 
timing, allowing processes to deschedule themselves until a specific time. 


One. timer is accessible only to high priority processes and is incremented every microsecond, cycling com- 
pletely in approximately 4295 seconds. The other is accessible only to low priority processes and is incre- 
mented every 64 microseconds, giving exactly 15625 ticks in one second. It has a full period of approximately 
76 hours. 


Table 3.4 Timer registers 


Clocko 
Clock! 
TNextReg0O 
TNextReg1 


Current value of high priority (level 0) process clock 
Current value of low priority (level 1) process clock 

Indicates time of earliest event on high priority (level 0) timer queue 
Indicates time of earliest event on low priority (level 1) timer queue 


The current value of the processor clock can be read by executing a /oad timer instruction. A process can 
arrange to perform a timer input, in which case it will become ready to execute after a specified time has 
been reached. The timer input instruction requires a time to be specified. If this time is in the ‘past’ then the 
instruction has no effect. If the time is in the ‘future’ then the process is descheduled. When the specified 
time is reached the process is scheduled again. 


Figure 3.4 shows two processes waiting on the timer queue, one waiting for time 21, the other for time 31. 


TimerO |S 


TNextReg0 


TPtrLoc 


Figure 3.4 Timer registers 
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4 Instruction set summary 


The Function Codes table 4.8. gives the basic function code set (page 7). Where the operand is less than 16, 
a single byte encodes the complete instruction. If the operand is greater than 15, one prefix instruction (pfix) 
is required for each additional four bits of the operand. If the operand is negative the first prefix instruction 
will be nfix. 


Table 4.1 prefix coding 


Function Memory 
Mnemonic code code 


ldc 


Idc 

is coded as 
pfix 
Idc 


ldc 

is coded as 
pfix #2 
pfix #2 
ldc #4 


ldc -31 (ldc 1 #FFFFFFE1) 
is coded as 

nix #1 #6 

ldc #1 #4 


Tables 4.9 to 4.27 give details of the operation codes. Where an operation code is less than 16 (e.g. add: 
operation code 05), the operation can be stored as a single byte comprising the operate function code F and 
the operand (5 in the example). Where an operation code is greater than 15 (e.g. ladd: operation code 16), 
the prefix function code 2 is used to extend the instruction. 


Table 4.2 operate coding 


Function Memory 
Mnemonic code code 


add (op. code #5) 
is coded as 
opr add 


ladd (op. code #16) 
is coded as 

pfix #1 

opr #6 


In the Floating Point Operation Codes tables 4.21 to 4.27, a selector sequence code (page 21) is indicated 
in the Memory Code column by s. The code given in the Operation Code column is the indirection code, the 
operand for the /dc instruction. 


The FPU and processor operate concurrently, so the actual throughput of floating point instructions is better 
than that implied by simply adding up the instruction times. For full details see The Transputer Instruction Set 
- ACompiler Writers’ Guide. 
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The, Processor Cycles column refers to the number of periods TPCLPCL taken by an instruction executing 
in internal memory. The number of cycles is given for the basic operation only; where the memory code 
for an instruction is two bytes, the time for the prefix function (one cycle) should be added. For a 20 MHz 
transputer one cycle is 50 ns. Some instruction times vary. Where a letter is included in the cycles column it 
is interpreted from table 4.3. 


Table 4.3 Instruction set interpretation 


Interpretation 
Bit number of the highest bit set in register A. Bit 0 is the least significant bit. 


Bit number of the highest bit set in the absolute value of register A. 
Bit 0 is the least significant bit. 


Number of places shifted. 


Number of words in the message. Part words are counted as full words. If the message 
is not word aligned the number of words is increased to include the part words at either 
end of the message. 


Number of words per row. 
Number of rows. 


The DE column of the tables indicates the descheduling/error features of an instruction as described in 


table 4.4. 
G 


4.1 Descheduling points 


Table 4.4 Instruction features 


The instruction is a descheduling point 
The instruction will affect the Error flag 
The instruction will affect the FP_Error flag 


The instructions in table 4.5 are the only ones at which a process may be descheduled (page 9). They are 
also the ones at which the processor will halt if the Analyse pin is asserted (page 27). 


Table 4.5 Descheduling point instructions 


input message output message output byte output word 
timer alt wait timer input stop on error alt wait 


jump loop end end process Slop process 
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4.2 Error instructions 
The instructions in table 4.6 are the only ones which can affect the Error flag (page 28) directly. Note, 


however, that the floating point unit error flag FP_Error is set by certain floating point instructions (page 14), 
and that Error can be set from this flag by fpcheckerror. 


Table 4.6 Error setting instructions 


add add constant subtract 
multiply fractional multiply divide remainder 


long add long subtract long divide 
set error testerr focheckerror 
check word check subscript from 0 check single check count from 1 


4.3 Floating point errors 


The instructions in table 4.7 are the only ones which can affect the floating point error flag FP_Error (page 21). 
Error is set from this flag by focheckerror if FP_Error is set. 


Table 4.7 Floating point error setting instructions 


fpadd fpsub fomul fpdiv 
fpldniaddsr fpldniadddb fpldnimulsn fpldnimuldb 
fpremfirst fpusqrttirst fogt fpeq 
fpuseterror fpuclearerror fptesterror 


fouexpincby32 fpuexpdecby32 foumulby2 fpudivby2 
four32tor64 fpur64tor32 fpucki32 fpucki64 
fprtoi32 fouabs fpint 


4 Instruction set summary 


PAANOOAA WH = © 


B 
C 
D 
E 
= 


Table 4.8 IMS T800 function codes 


jump 

load local pointer 

prefix 

load non-local 

load constant 

load non-local pointer 
negative prefix 

load local 

add constant 

Call 

conditional jump (not taken) 
conditional jump (taken) 
adjust workspace 
equals constant 

store local 

store non-local 

operate 


or 
exclusive or 
bitwise not 
shift left 
shift right 


add 

subtract 

multiply 

fractional multiply (no rounding) 
fractional multiply (rounding) 
divide 

remainder 

greater than 

difference 

sum 

product for positive register A 
product for negative register A 
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Table 4.10 IMS T800 long arithmetic operation codes 


Pa ee ee 
cous "Code cles 
long add 
long subtract 
long sum 
long diff 
long multiply 
long divide 
long shift left (n<32) 
long shift left(n>32) 
long shift right (n<32) 
long shift right (n>32) 
normalise (n<32) 
normalise (n>32) 
normalise (n=64) 


reverse 

extend to word 
check word 

extend to double 
check single 
minimum integer 
duplicate top of stack 


Table 4.12 IMS T800 2D block move operation codes 


movez2dinit initialise data for 2D block move 
movez2daill ae 2D block copy 

move2dnonzero (2p+23)4r | 2D block copy non-zero bytes 
move2dzero (2p+23)«r | 2D block copy zero bytes 


Table 4.13 IMS T800 CRC and bit operation codes 


crcword calculate crc on word 
crcbyte calculate crc on byte 


bitcnt count bits set in word 
bitrevword reverse bits in word 
bitrevnbits reverse bottom n bits in word 
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Table 4.14 IMS T800 indexing/array operation codes 


Pas ee eee ee TE 
cous cose Cycles 

byte subscript 

word subscript 

form double word subscript 


byte count 


word count 
load byte 
Store byte 


move message 


Table 4.15 IMS T800 timer handling operation codes 


i 
cose Code Cycles 

Idtimer load timer 

tin timer input (time future) 
timer input (time past) 
timer alt start 

timer alt wait (time past) 
timer alt wait (time future) 
enable timer 
disable timer 


talt 
taltwt 


enbt 
dist 


Table 4.16 IMS T800 input/output operation codes 


ese” ["ESE” | wnmone [Sees [ me LE 
co cogs Cycles 

input message 
output message 
Output word 


output byte 


i 
outword 
outbyte 


alt start 
alt wait (channel ready) 

alt wait (channel not ready) 
alt end 


alt 
altwt 


altend 


enbs 
diss 


enable skip 
disable skip 


reset channel 
enable channel (ready) 

enable channel (not ready) 
disable channel 


resetch 
enbc 


disc 


18 


Table 4.17 IMS T800 control operation codes 


Operation | Memory Processor 
cone Code cycles 


return 
load pointer to instruction 
general adjust workspace 
general call 

loop end (loop) 

loop end (exit) 


Table 4.18 IMS T800 scheduling operation codes 


csub0 
cent 
testerr 


seterr 
stoperr 
clirhalterr 
sethalterr 
testhalterr 


testpranal 
saveh 
savel 

sthf 

sthb 

Stif 

stlb 
sttimer 


Start process 

end process 

run process 

stop process 

load current priority 


check subscript from 0 

check count from 1 

test error false and clear (no error) 
test error false and clear (error) 
Set error 

stop on error (no error) 

clear halt-on-error 

Set halt-on-error 

test halt-on-error 


test processor analysing 

save high priority queue registers 
save low priority queue registers 
Store high priority front pointer 
Store high priority back pointer 
Store low priority front pointer 
store low priority back pointer 
Store timer 
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ese [ "SSR [none [See | me LE 
cone go cles 

fpidnisn 
fpidnidb 
fpidnisni 
fpldnidbi 
fpldzerosn 


fpldzerodb 


Processor cycles are shown as Typical/Maximum cycles. 
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Table 4.21 IMS T800 floating point load/store operation codes 


fpldniaddsn 
fpldniadddb 
fpldnimulsn 
fpldnimuldb 
fpstnisn 
fpstnidb 
fpstnli32 


fp load non-local single 

fp load non-local double 

fp load non-local indexed single 
fp load non-local indexed double 
load zero single 

load zero double 

fp load non local & add single 

fp load non local & add double 

fp load non local & multiply single 
fp load non local & multiply double 
fp store non-local single 

fp store non-local double 

store non-local int32 


Table 4.22 IMS T800 floating point general operation codes 


floating point unit entry 
fp reverse 
fp duplicate 


Table 4.23 IMS T800 floating point rounding operation codes 


fpchkerror 
fptesterror 
fpuseterror 
fpuciearerror 


set rounding mode to round nearest 
set rounding mode to round zero 
set rounding mode to round positive 
set rounding mode to round minus 


check fp error 

test fp error false and clear 
set fp error 

clear fp error 
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Table 4.25 IMS T800 floating point comparison operation codes 


fpot 

fpeq 
fpordered 
fpnan 
fpnotfinite 
fpuchki32 
fpuchki64 


Operation | Memory Processor 
code Code cys 


fp greater than 
fp equality 

fp orderability 
fp NaN 

fp not finite 
check in range of type int32 
check in range of type int64 


Processor cycles are shown as Typical/Maximum cycles. 


Table 4.26 IMS T800 floating point conversion operation codes 


fpur32tor64 
fpur64tor32 
fprtoi32 
fpi32tor32 
fpi32tor64. 
fpb32tor64 
fpunoround 


real32 to real64 

real64 to real32 

real to int32 

int32 to real32 

int32 to real64 

bit32 to real64 

real64 to real32, no round 


fpint 


round to floating integer 


Processor cycles are shown as Typical/Maximum cycles. 


Table 4.27 IMS T800 floating point arithmetic operation codes 


cow Veeda 
fpadd 
fpsub 
fpmul 
fpdiv 
fpuabs 
fpremfirst 
fpremstep 
fpusartfirst 
fpusqristep 
fpusartlast 


fpuexpinc32 
fpuexpdec32 


fpumulby2 
fpudivby2 


i 
| Single_| Double | 
6/9 fp add 

fp subtract 

fp multiply 

fp divide 

fp absolute 

fp remainder first step 

fp remainder iteration 

fp square root first step 

fp square root step 

fp square root end 

multiply by 252 

divide by 2°? 

multiply by 2.0 

divide by 2.0 


Processor cycles are shown as Typical/Maximum cycles. 
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5 Floating point unit 


The 64 bit FPU provides single and double length arithmetic to floating point standard ANSI-IEEE 754-1985. 
It is able to perform floating point arithmetic concurrently with the central processor unit (CPU), sustaining in 
excess of 2.25 Mflops on a 30 MHz device. All data communication between memory and the FPU occurs 
under control of the CPU. 


The FPU consists of a microcoded computing engine with a three deep floating point evaluation stack for 
manipulation of floating point numbers. These stack registers are FA, FB and FC, each of which can hold 
either 32 bit or 64 bit data; an associated flag, set when a floating point value is loaded, indicates which. The 
stack behaves in a similar manner to the CPU stack (page 6). 


As with the CPU stack, the FPU stack is not saved when rescheduling (page 9) occurs. The FPU can be 
used in both low and high priority processes. When a high priority process interrupts a low priority one 
the FPU state is saved inside the FPU. The CPU will service the interrupt immediately on completing its 
current operation. The high priority process will not start, however, before the FPU has completed its current 
operation. 


Points in an instruction stream where data need to be transferred to or from the FPU are called synchronisation 
points. At a synchronisation point the first processing unit to become ready will wait until the other is ready. 
The data transfer will then occur and both processors will proceed concurrently again. In order to make 
full use of concurrency, floating point data source and destination addresses can be calculated by the CPU 
whilst the FPU is performing operations on a previous set of data. Device performance is thus optimised by 
minimising the CPU and FPU idle times. 


The FPU has been designed to operate on both single length (32 bit) and double length (64 bit) floating 
point numbers, and returns results which fully conform to the ANSI-IEEE 754-1985 floating point arithmetic 
standard. Denormalised numbers are fully supported in the hardware. All rounding modes defined by the 
standard are implemented, with the default being round to nearest. 


The basic addition, subtraction, multiplication and division operations are performed by single instructions. 
However, certain less frequently used floating point instructions are selected by a value in register A (when 
allocating registers, this should be taken into account). A /oad constant instruction /dc is used to load 
register A; the floating point entry instruction fpentry then uses this value to select the floating point operation. 
This pair of instructions is termed a selector sequence. 


Names of operations which use fpentry begin with fpu. A typical usage, returning the absolute value of a 
floating point number, would be 


Idc fpuabs; fpentry; 


Since the indirection code for fouabs is OB, it would be encoded as 


Table 5.1 fpentry coding 


Function Memory 
Mnemonic code code 


ldc fouabs 


fpentry (op. code #AB) 
is coded as 

pfix #A 

opr #B 


22 


The remainder and square root instructions take considerably longer than other instructions to complete. In 
order to minimise the interrupt latency period of the transputer they are split up to form instruction sequences. 
As an example, the instruction sequence for a single length square root is 


fpusqrtfirst; fpusqristep; fpusqristep; fpusqrilast; 


The FPU has its own error flag FP_Error. This reflects the state of evaluation within the FPU and is set in 
circumstances where invalid operations, division by zero or overflow exceptions to the ANSI-IEEE 754-1985 
Standard would be flagged (page 14). FP_Error is also set if an input to a floating point operation is infinite 
or is not a number (NaN). The FP_Error flag can be set, tested and cleared without affecting the main Error 
flag, but can also set Error when required (page 14). Depending on how a program is compiled, it is possible 
for both unchecked and fully checked floating point arithmetic to be performed. 


Further details on the operation of the FPU can be found in The Transputer Instruction Set - A Compiler 
Writers’ Guide. 


Table 5.2 Typical floating point operation times for IMS T800 


800-20 
Operation Single length Double length Single length Double length 


add 350 ns 233 ns 
subtract 350 ns 233 ns 
multiply 1000 ns 667 ns 
divide 1600 ns 1067 ns 


Timing is for operations where both operands are normalised fp numbers. 
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6 System services 


System services include all the necessary logic to initialise and sustain operation of the device. They also 
include error handling and analysis facilities. 


6.1 Power 


Power is supplied to the device via the VCC and GND pins. Several of each are provided to minimise 
inductance within the package. All supply pins must be connected. The supply must be decoupled close to 
the chip by at least one 100 nF low inductance (e.g. ceramic) capacitor between VCC and GND. Four layer 
boards are recommended; if two layer boards are used, extra care should be taken in decoupling. 


Input voltages must not exceed specification with respect to VCC and GND, even during power-up and power- 
down ramping, otherwise latchup can occur. CMOS devices can be permanently damaged by excessive 
periods of latchup. 


6.2 CapPlus, CapMinus 


The internally derived power supply for internal clocks requires an external low leakage, low inductance 1 uF 
Capacitor to be connected between CapPlus and CapMinus. A ceramic capacitor is preferred, with an 
impedance less than 3 Ohms between 100 KHz and 10 MHz. If a polarised capacitor is used the negative 
terminal should be connected to CapMinus. Total PCB track length should be less than 50 mm. The 
connections must not touch power supplies or other noise sources. 


CapPlus P.C.B. track 


Phase-locked poets 


loops 1 pF 


CapMinus P.C.B. track 


Figure 6.1 Recommended PLL decoupling 


6.3 Clockin 


Transputer family components use a standard clock frequency, supplied by the user on the Clockin input. 
The nominal frequency of this clock for all transputer family components is 5 MHz, regardless of device type, 
transputer word length or processor cycle time. High frequency internal clocks are derived from Clockin, 
simplifying system design and avoiding problems of distributing high speed clocks externally. 


A number of transputer devices may be connected to a common clock, or may have individual clocks providing 
each one meets the specified stability criteria. In a multi-clock system the relative phasing of ClockIn clocks © 
is not important, due to the asynchronous nature of the links. Mark/space ratio is unimportant provided the 
specified limits of Clockin pulse widths are met. 


Oscillator stability is important. Clockin must be derived from a crystal oscillator; RC oscillators are not 
sufficiently stable. Clockin must not be distributed through a long chain of buffers. Clock edges must be 
monotonic and remain within the specified voltage and time limits. 
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Table 6.1 Input clock 


[SYMBOL [| ~—~—~—~SPARAMETER~——SSO«|_sMIN’ | NOM | 
TDCLDCH | Clockin pulse width low 
TDCHDCL | Clocklin pulse width high 

TDCLDCL | Clockin period 

TDCerror | Clockin timing error 

TDC1DC2 | Difference in Clockin for 2 linked devices 
TDCr Clockin rise time 

TDCf Clockin fall time 


Notes 


| MAX | UNITS | NOTE | 


1 Measured between corresponding points on consecutive falling edges. 
2 Variation of individual falling edges from their nominal times. 
3 This value allows the use of 200ppm crystal oscillators for two devices connected together by a link. 


4 Clock transitions must be monotonic within the range VIH to VIL (table 11.3). 


TDCerror TDCerror 
TDCerror TDCerror 


Figure 6.2 Clockin timing 


6.4 ProcSpeedSelect0-2 


Processor speed of the IMS T800 is variable in discrete steps. The desired speed can be selected, up to the 
maximum rated for a particular component, by the three speed select lines ProcSpeedSelect0-2. The pins 
are tied high or low, according to the table below, for the various speeds. The ProcSpeedSelect0-2 pins 
are designated HoldToGND on the IMS 1414, and coding is so arranged that the IMS T800 can be plugged 
directly into a board designed for a 20 MHz IMS 1414. 


Only six of the possible speed select combinations are currently used; the other two are not valid speed 
selectors. The frequency of Clockin for the speeds given in the table is 5 MHz. 
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Table 6.2 Processor speed selection 


Proc Proc Proc Processor Processor 
Speed Speed Speed Clock Cycle 
Select2 Select1 SelectO | Speed MHz Time nS 


Invalid 


Invalid 


~~ —- OO 00 
=3O0O-7 0-00 


Note: Inclusion of a speed selection in this table does not imply immediate availability. 


6.5 Reset 


Reset can go high with VCC, but must at no time exceed the maximum specified voltage for VIH. After VCC is 
valid Clockin should be running for a minimum period TDCVRL before the end of Reset. The falling edge of 
Reset initialises the transputer, triggers the memory configuration sequence and starts the bootstrap routine. 
Link outputs are forced low during reset; link inputs and EventReq should be held low. Memory request 
(DMA) must not occur whilst Reset is high but can occur before bootstrap (page 50). 


After the end of Reset there will be a delay of 144 periods of Clockin (figure 6.3). Following this, the 
MemWrDO, MemRfD1 and MemAD2-31 pins will be scanned to check for the existence of a pre-programmed 
memory interface configuration (page 40). This lasts for a further 144 periods of ClockIn. Regardless of 
whether a configuration was found, 36 configuration read cycles will then be performed on external memory 
using the default memory configuration (page 41), in an attempt to access the external configuration ROM. 
A delay will then occur, its period depending on the actual configuration. Finally eight complete and con- 
secutive refresh cycles will initialise any dynamic RAM, using the new memory configuration. If the memory 
configuration does not enable refresh of dynamic RAM the refresh cycles will be replaced by an equivalent 
delay with no external memory activity. 


lf BootFromRom is high bootstrapping will then take place immediately, using data from external memory; 
otherwise the transputer will await an input from any link. The processor will be in the low priority state. 


Reset | | 


Action 


Internal External 
Delay configuration configuration  Velay Refresh Boot 


Figure 6.3 IMS T800 post-reset sequence 


6.6 Bootstrap 


The transputer can be bootstrapped either from a link or from external ROM. To facilitate debugging, Boot- 
FromRom may be dynamically changed but must obey the specified timing restrictions. It is sampled once 
only by the transputer, before the first instruction is executed after Reset is taken low. 


If BootFromRom is connected high (e.g. to VCC) the transputer starts to execute code from the top two bytes 
in external memory, at address #7FFFFFFE. This location should contain a backward jump to a program in 
ROM. Following this access, BootFromRom may be taken low if required. The processor is in the low priority 
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state, and the W register points to MemStart (page 29). 


Table 6.3 Reset and Analyse 


|SYMBOL | s=SC PARAMETER ———CiédT:«=CWMN'-:« ||_'NOM | MAX | UNITS | NOTE 
TPVRH Power valid before Reset m 


Ss 
Clockin 
ms 


TRHRL 
TDCVRL 


Reset pulse width high 
Clockin running before Reset end 


CO+7W50S5 


TAHRH Analyse setup before Reset ms 
TRLAL Analyse hold after Reset end Clockin 
TBRVRL_ | BootFromRom setup ms 
TRLBRX_ | BootFromRom hold after Reset ms 


TALBRX 
Notes 


BootFromRom hold after Analyse 


1 Full periods of Clockin TDCLDCL required. 
2 At power-on reset. 


3 Must be stable until after end of bootstrap period. See Bootstrap section. 


Clockin 


vcc 


Reset 


BootFromRom 


Figure 6.4 Transputer reset timing with Analyse low 


Analyse 


TBRVRL TALBRX 


BootFromRom 


Figure 6.5 Transputer reset and analyse timing 
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lf BootFromRom is connected low (e.g. to GND) the transputer will wait for the first bootstrap message to 
arrive on any one of its links. The transputer is ready to receive the first byte on a link within two processor 
cycles TPCLPCL after Reset goes low. 


If the first byte received (the control byte) is greater than 1 it is taken as the quantity of bytes to be input. The 
following bytes, to that quantity, are then placed in internal memory starting at location MemStart. Following 
reception of the last byte the transputer will start executing code at MemStart as a low priority process. 
BootFromRom may be taken high after reception of the last byte, if required. The memory space immediately 
above the loaded code is used as work space. Messages arriving on other links after the control byte has 
been received and on the bootstrapping link after the last bootstrap byte will be retained until a process inputs 
from them. 


6.7 Peek and poke 


Any location in internal or external memory can be interrogated and altered when the transputer is waiting 
for a bootstrap from link. If the control byte is 0 then eight more bytes are expected on the same link. The 
first four byte word is taken as an internal or external memory address at which to poke (write) the second 
four byte word. If the control byte is 1 the next four bytes are used as the address from which to peek (read) 
a word of data; the word is sent down the output channel of the same link. 


Following such a peek or poke, the transputer returns to its previously held state. Any number of accesses 
may be made in this way until the control byte is greater than 1, when the transputer will commence reading 
its bootstrap program. Any link can be used, but addresses and data must be transmitted via the same link 
as the control byte. 


6.8 Analyse 


If Analyse is taken high when the transputer is running, the transputer will halt at the next descheduling 
point (page 13). From Analyse being asserted, the processor will halt within three time slice periods plus 
the time taken for any high priority process to complete. As much of the transputer status is maintained as is 
necessary to permit analysis of the halted machine. Processor flags Error and HaltOnError are not altered 
at reset, whether Analyse is asserted or not. Memory refresh continues. 


Input links will continue with outstanding transfers. Output links will not make another access to memory 
for data but will transmit only those bytes already in the link buffer. Providing there is no delay in link 
acknowledgement, the links should be inactive within a few microseconds of the transputer halting. 


Reset should not be asserted before the transputer has halted and link transfers have ceased. When Reset 
is taken low whilst Analyse is high, neither the memory configuration sequence nor the block of eight refresh 
cycles will occur; the previous memory configuration will be used for any external memory accesses. If 
BootFromRom is high the transputer will bootstrap as soon as Analyse is taken low, otherwise it will await a 
control byte on any link. If Analyse is taken low without Reset going high the transputer state and operation 
are undefined. After the end of a valid Analyse sequence the registers have the values given in table 6.4. 


Table 6.4 Register values after Analyse 


MemStart if bootstrapping from a link, or the external memory bootstrap address if 
bootstrapping from ROM. 


W  MemStart if bootstrapping from ROM, or the address of the first free word after the 
bootstrap program if bootstrapping from link. 


A The value of / when the processor halted. 


B The value of Wwhen the processor halted, together with the priority of the process 
when the transputer was halted (i.e. the W descriptor). 


The ID of the bootstrapping link if bootstrapping from link. 
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6.9 Error, Errorin 


The Error pin carries the OR’ed output of the internal Error flag and the Errorin input. If Error is high 
it indicates either that Errorin is high or that an error was detected in one of the processes. An internal 
error can be caused, for example, by arithmetic overflow, divide by zero, array bounds violation or software 
setting the flag directly (page 14). It can also be set from the floating point unit under certain circumstances 
(page 14, 21). Once set, the Error flag is only cleared by executing the instruction testerr. The error is not 
cleared by processor reset, in order that analysis can identify any errant transputer (page 27). 


A process can be programmed to stop if the Error flag is set; it cannot then transmit erroneous data to other 
processes, but processes which do not require that data can still be scheduled. Eventually all processes 
which rely, directly or indirectly, on data from the process in error will stop through lack of data. Errorin does 
not directly affect the status of a processor in any way. 


By setting the HaltOnError flag the transputer itself can be programmed to halt if Error becomes set. If Error 
becomes set after Ha/tOnErrorhas been set, all processes on that transputer will cease but will not necessarily 
cause other transputers in a network to halt. Setting HaltOnError after Error will not cause the transputer to 
halt; this allows the processor reset and analyse facilities to function with the flags in indeterminate states. 


An alternative method of error handling is to have the errant process or transputer cause all transputers 
to halt. This can be done by ‘daisy-chaining’ the Errorin and Error pins of a number of processors and 
applying the final Error output signal to the EventReq pin of a suitably programmed master transputer. Since 
the process state is preserved when stopped by an error, the master transputer can then use the analyse 
function to debug the fault. When using such a circuit, note that the Error flag is in an indeterminate state on 
power up; the circuit and software should be designed with this in mind. 


Error checks can be removed completely to optimise the performance of a proven program; any unexpected 
error then occurring will have an arbitrary undefined effect. 


If a high priority process pre-empts a low priority one, status of the Error and HaltOnError flags is saved for 
the duration of the high priority process and restored at the conclusion of it. Status of both flags is transmitted 
to the high priority process. Either flag can be altered in the process without upsetting the error status of any 
complex operation being carried out by the pre-empted low priority process. 


In the event of a transputer halting because of HaltOnError, the links will finish outstanding transfers before 
shutting down. If Analyse is asserted then all inputs continue but outputs will not make another access to 
memory for data. Memory refresh will continue to take place. 


After halting due to the Error flag changing from 0 to 1 whilst HaltOnError is set, register | points two bytes 
_ past the instruction which set Error. After halting due to the Analyse pin being taken high, register / points 
one byte past the instruction being executed. In both cases / will be copied to register A. 


Master 
Transputer 


T800 
Event slave 0 slave 1 slave n 


Errorin Error Errorin§ Errorj---»Errorin § Error 
(transputer links not shown) 


Figure 6.6 Error handling in a multi-transputer system 
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7 Memory 


The IMS T800 has 4 Kbytes of fast internal static memory for high rates of data throughput. Each inter- 
nal memory access takes one processor cycle ProcClockOut (page 31). The transputer can also access 
4 Gbytes of external memory space. Internal and external memory are part of the same linear address space. 
Internal RAM can be disabled by holding DisableIntRAM high. All internal addresses are then mapped to 
external RAM. This pin should not be altered after Reset has been taken low. 


IMS T800 memory is byte addressed, with words aligned on four-byte boundaries. The least significant byte 
of a word is the lowest addressed byte. 


The bits in a byte are numbered 0 to 7, with bit 0 the least significant. The bytes are numbered from 0, with 
byte 0 the least significant. In general, wherever a value is treated as a number of component values, the 
components are numbered in order of increasing numerical significance, with the least significant component 
numbered 0. Where values are stored in memory, the least significant component value is stored at the 
lowest (most negative) address. 


Internal memory starts at the most negative address #80000000 and extends to #80000FFF. User memory 
begins at #80000070; this location is given the name MemSiart. 


A reserved area at the bottom of internal memory is used to implement link and event channels. 


Two words of memory are reserved for timer use, 7PtrLocd for high priority processes and 7PtrLoc! for low 
priority processes. They either indicate the relevant priority timer is not in use or point to the first process on 
the timer queue at that priority level. 


Values of certain processor registers for the current low priority process are saved in the reserved IntSaveLoc 
locations when a high priority process pre-empts a low priority one. Other locations are reserved for extended 
features such as block moves and floating point operations. 


External memory space starts at #80001000 and extends up through #00000000 to #7FFFFFFF. Memory 
configuration data and ROM bootstrapping code must be in the most positive address space, starting at 
he Ail and #7FFFFFFE respectively. Address space immediately below this is conventionally used for 
ROM based code. 


hi Machine map _ lo Byte address Word offsets occam map 


|_| #7FFFFFFE 


i Memory configuration joo. 
| ee | #7FFFFFEC 
#0 


~ ~- oo 


~~ ~~ 


| | #80001000 — Start of external memory — #0400 | | 
- « #80000070 MemStart MemStart #1C 

Reserved for #8000006C 

Extended functions #80000048 


EregintSaveLoc #80000044 
STATUSIntSaveLoc | #80000040 


CregintSaveLoc #8000003C 


#80000038 
#80000034 
#80000030 
#8000002C 
#80000028 
#80000024 
#80000020 
#8000001C 
#80000018 
#80000014 
#80000010 
#8000000C 
#80000008 
#80000004 
#80000000 (Base of memory) 


Figure 7.1 IMS T800 memory map 


These locations are used as auxiliary processor registers and should not be manipulated by the user. 
Like processor registers, their contents may be useful for implementing debugging tools (Analyse, 
page 27). For details see The Transputer Instruction Set - A Compiler Writers’ Guide. 
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8 External memory interface 


The External Memory Interface (EMI) allows access to a 32 bit address space, supporting dynamic and static 
RAM as well as ROM and EPROM. EMI timing can be configured at Reset to cater for most memory types 
and speeds, and a program is supplied with the Transputer Development System to aid in this configuration. 


There are 13 internal configurations which can be selected by a single pin connection (page 40). If none are 
suitable the user can configure the interface to specific requirements, as shown in page 41. 


8.1 ProcClockOut 


This clock is derived from the internal processor clock, which is in turn derived from Clockin. Its period is 
equal to one internal microcode cycle time, and can be derived from the formula 


TPCLPCL = TDCLDCL / PLLx 


where TPCLPCL is the ProcClockOut Period, TDCLDCL is the Clockin Period and PLLx is the phase 
lock loop factor for the relevant speed part, obtained from the ordering details (Ordering section). 


The time value Tm is used to define the duration of Tstates and, hence, the length of external memory cycles; 
its value is exactly half the period of one ProcClockOut cycle (0.5+TPCLPCL), regardless of mark/space 
ratio of ProcClockOut. 


Edges of the various external memory strobes coincide with rising or falling edges of ProcClockOut. It should 
be noted, however, that there is a skew associated with each coincidence. The value of skew depends on 
whether coincidence occurs when the ProcClockOut edge and strobe edge are both rising, when both are 
falling or if either is rising when the other is falling. Timing values given in the strobe tables show the best 
and worst cases. If a more accurate timing relationship is required, the exact Tstate timing and strobe edge 
to ProcClockOut relationships should be calculated and the correct skew factors applied from the edge skew 
timing table 8.4. 


The timing parameters in the following tables are based on full characterisation of the 17 MHz and 20 MHz 
parts. Data for higher speeds is based on tests on a limited number of samples and may change when full 
characterisation is completed. 


8.2 Tstates 
The external memory cycle is divided into six Tstates with the following functions: 


T1 + Address setup time before address valid strobe. 
T2 Address hold time after address valid strobe. 
T3 Read cycle tristate or write cycle data setup. 

T4 Extendable data setup time. 

TS Read or write data. 

T6 Data hold. 


Under normal conditions each Tstate may be from one to four periods Tm long, the duration being set during 
memory configuration. The default condition on Reset is that all Tstates are the maximum four periods Tm 
long to allow external initialisation cycles to read slow ROM. 


Period T4 can be extended indefinitely by adding externally generated wait states. 
An external memory cycle is always an even number of periods Tm in length and the start of T1 always 


coincides with a rising edge of ProcClockOut. If the total configured quantity of periods Tm is an odd 
number, one extra period Tm will be added at the end of T6 to force the start of the next T1 to coincide with 
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a rising edge of ProcClockOut. This period is designated E in configuration diagrams (figure 8.11). 


Table 8.1 ProcClockOut 


SYMBOL |_______ PARAMETER ____j MIN | NOM | MAX | UNITS _ 
TPCLPCL | ProcClockOut period 
TPCHPCL | ProcClockOut pulse width high 
TPCLPCH | ProcClockOut pulse width low 
Tm ProcClockOut half cycle 

TPCstab ProcClockOut stability 


Notes 


NOTE 


1 ais TDCLDCL/PLLx. 
2 b is 0.5% TPCLPCL (half the processor clock period). 
3 ¢ is TPCLPCL-TPCHPCL. 


4 Stability is the variation of cycle periods between two consecutive cycles, measured at corresponding points on 
the cycles. 


Figure 8.1 IMS T800 ProcClockOut timing 


8.3 Internal access 

During an internal memory access cycle the external memory interface bus MemAD2-31 reflects the word 
address used to access internal RAM, MemnotWrD0 reflects the read/write operation and MemnotRfD1 is 
high; all control strobes are inactive. This is true unless and until a memory refresh cycle or DMA (memory 
request) activity takes place, when the bus will carry the appropriate external address or data. 


The bus activity is not adequate to trace the internal operation of the transputer in full, but may be used for 
hardware debugging in conjuction with peek and poke (page 27). 


ProcClockOut $% 9\ OXY ON YON 


MemnotWrDo Write Read Read 


MemnotRfD1 » << 
MemAD2-31 


Figure 8.2 IMS T800 bus activity for internal memory cycle 


8 External memory interface 33 


8.4 MemAD2-31 


External memory addresses and data are multiplexed on one bus. Only the top 30 bits of address are 
Output on the external memory interface, using pins MemAD2-31. They are normally output only during 
Tstates T1 and T2, and should be latched during this time. Byte addressing is carried out internally by the 
transputer for read cycles. For write cycles the relevant bytes in memory are addressed by the write strobes 
notMemWrB0-3. 


The data bus is 32 bits wide. It uses MemAD2-31 for the top 30 bits and MemnotRfD1 and MemnotWrDO 
for the lower two bits. Read cycle data may be set up on the bus at any time after the start of T3, but must 
be valid when the transputer reads it at the end of T5. Data may be removed any time during T6, but must 
be off the bus no later than the end of that period. 


Write data is placed on the bus at the start of T3 and removed at the end of T6. If T6 is extended to force the 
next cycle Tmx (page 33) to start on a rising edge of ProcClockOut, data will be valid during this time also. 


8.5 MemnotWrD0 


During T1 and T2 this pin will be low if the cycle is a write cycle, otherwise it will be high. During Tstates T3 
to T6 it becomes bit 0 of the data bus. In both cases it follows the general timing of MemAD2-31. 


8.6 MemnotRfD1 


During T1 and T2, this pin is low if the address on MemAD2-31 is a refresh address, otherwise it is high. 
During Tstates T3 to T6 it becomes bit 1 of the data bus. In both cases it follows the general timing of 
MemAD2-31. 


8.7 notMemRd 


For a read cycle the read strobe notMemRad is low during T4 and T5. Data is read by the transputer on the 
rising edge of this strobe, and may be removed immediately afterward. If the strobe duration is insufficient it 
may be extended by adding extra periods Tm to either or both of the Tstates T4 and T5. Further extension 
may be obtained by inserting wait states at the end of T4. 


In the read cycle timing diagrams ProcClockOut is included as a guide only; it is shown with each Tstate 
configured to one period Tm. 


8.8 notMemS0-4 


To facilitate control of different types of memory and devices, the EMI is provided with five strobe outputs, 
four of which can be configured by the user. The strobes are conventionally assigned the functions shown in 
the read and write cycle diagrams, although there is no compulsion to retain these designations. 


notMemsS0O is a fixed format strobe. Its leading edge is always coincident with the start of T2 and its trailing 
edge always coincident with the end of T5. 


The leading edge of notMemS1 is always coincident with the start of T2, but its duration may be configured 
to be from zero to 31 periods Tm. Regardless of the configured duration, the strobe will terminate no later 
than the end of T6. The strobe is sometimes programmed to extend beyond the normal end of Tmx. When 
wait states are inserted into an EMI cycle the end of Tmx is delayed, but the potential active duration of the 
strobe is not altered. Thus the strobe can be configured to terminate relatively early under certain conditions 
(page 48). If notMemS1 is configured to be zero it will never go low. 
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notMemS2, notMemS3 and notMemS4 are identical in operation. They all terminate at the end of T5, but 
the start of each can be delayed from one to 31 periods Tm beyond the start of T2. If the duration of one of 
these strobes would take it past the end of T5 it will stay high. This can be used to cause a strobe to become 
active only when wait states are inserted. If one of these strobes is configured to zero it will never go low. 
Figure 8.5 shows the effect of Wait on strobes in more detail; each division on the scale is one period Tm. 


Table 8.2 Read 


| SYMBOL | CO PARAMETER|sOMIN’ | NOM | MAX | UNITS | NOTE | 


TaZdV Address tristate to data valid 
a+2 
1 
b+6 


TdVRdH Data setup before read 
1 a is total of T2+T3 where T2, T3 can be from one to four periods Tm each in length. 


TRdHdX Data hold after read 

TSOLRdL | notMemS0O before start of read 
TSOHRdH | End of read from end of notwvemS0 
TRdLRdH | Read period 


Notes 


2 b is total of T4+Twait+TS where T4, T5 can be from one to four periods Tm each in length and Twait may be 
any number of periods Tm in length. 


Tstate | T1 | T2 | T3 | 14 | | | | 


MemnotWrDO SEE jens coves 2 
MemnotRfD1 ed ee 
MemAD2-31 KKK Data > 

” ort 


TSOLRdL TRdLRdH 


i 
notMemRd 


TSOLSOH 


TSOHRdH 


notMemS0O 
(CE) 


TSOLSiL @) TSOHS1H @) 
TSOLS1H ©) 


notMemS1 
(ALE) 


Figure 8.3 IMS T800 external read cycle: static memory 
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Tstate | Ti | | | | | 


Proccieckowt No Oh ™! 
MemnotWrD0 EKG 
MemnothtD1 an) <“<:: 
MemAD2-31 eee ta I 

TaZdV TRdHdX 


TaVSOL , TSOLaXx TdVRdH 


TSOLRdL TRdLRdH ver 


notMemRd 


TSOLSOH 
notMemSO 
(RAS) 


TSOLS1L @ TSOHS1H 
TSOLS1H G) ® 


notMemS1 
(ALE) 


TSOLS2H (©) 


TSOLS2L (2) TSOHS2H 
notMemS2 
(AMUX) 


TSOLS3H (7) 


TSOLS3L @) TSOHS3H (1) 
notMemS3 


(CAS) 
TSOLS4H 


TSOLS4L @) TSOHS4H @ 
notMemS4 
(Wait state) 


Figure 8.4 IMS T800 external read cycle: dynamic memory 
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Table 8.3 IMS T800 strobe timing 


(SYMBOL [(@@)[—s—SsSCéPARRAMETER =|“ MIN’ | NOM | MAX | 


UNITS | NOTE | 
1 


TaVSOL Address setup before notwemS0 a 
TSOLax Address hold after notWemS0 
TSOLSOH notMemS0 pulse width low 
TSOLS1L 1 | notMemS1 from notvemSO 
TSOLS1H | 5 | notWemS1 end from notWemSO 
TSOHS1H | 9 | notMemS1 end from notMemS0 end 
TSOLS2L | 2 | notMemS2 delayed after notWemSO 
TSOLS2H | 6 | notMemS2 end from notvemSO 
TSOHS2H | 10 | notMemS2 end from notWemS0 end 
TSOLS3L | 3 | notMemS3 delayed after notWemSO 
TSOLS3H | 7 | notMemS3 end from notwemSO 
TSOHS3H | 11 | notWMemS3 end from notWvemS0O end 
TSOLS4L | 4 | notMemS4 delayed after notWemSO 
TSOLS4H | 8 | notMemS4 end from notvemSO 
TSOHS4H | 12 | notWemS4 end from notWvemS0 end 


Tmx Complete external memory cycle 


Notes 
1 ais T1 where T1 can be from one to four periods Tm in length. 
2 b is T2 where T2 can be from one to four periods Tm in length. 


3 ¢ is total of T2+T3+T4+Twait+T5 where T2, T3, T4, T5 can be from one to four periods Tm each in length and 
Twait may be any number of periods Tm in length. 


4 dcan be from zero to 31 periods Tm in length. 
5 e can be from -27 to +4 periods Tm in length. 


6 If the configuration would cause the strobe to remain active past the end of T6 it will go high at the end of T6. 
If the strobe is configured to zero periods Tm it will remain high throughout the complete cycle Tmx. 


7 f can be from zero to 31 periods Tm in length. If this length would cause the strobe to remain active past the 
end of TS it will go high at the end of T5. If the strobe value is zero periods Tm it will remain low throughout 
the complete cycle Tmx. 


8 g is one complete external memory cycle comprising the total of T1+T2+T3+T4+Twait+T5+T6 where Ti, T2, 
T3, T4, T5 can be from one to four periods Tm each in length, T6 can be from one to five periods Tm in length 
and Twait may be zero or any number of periods Tm in length. 


Tstate | T1|T2|T3|T4|T5|T6|T1 | Tstate |T1[T2|[T3|T4] W] W|T5|T6]T1 | 


notMemS1 \ | notMemS1 | 


notMemS2 notMemS2 | | 


No wait states Wait states inserted 


Figure 8.5 IMS T800 effect of wait states on strobes 
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Table 8.4 Strobe SO to ProcClockOut skew 


SYMBOL PARAMETER | MIN’ | NOM | MAX | UNITS 


TPCHSOH | Strobe rising from ProcClockOut rising 

TPCLSOH | Strobe rising from ProcClockOut falling 
ProcClockOut / f 1 } 
7 TPCHSOL ~ TPCLSOH ~« TPCLSOL 


TPCHSOL | Strobe falling from ProcClockOut rising 
TPCLSOL | Strobe falling from ProcClockOut falling 


TPCHSOH 
NotMemS0 


Figure 8.6 IMS T800 skew of notMemS0 to ProcClockOut 


8.9 notMemWrB0-3 


Because the transputer uses word addressing, four write strobes are provided; one to write each byte of the 
word. notMemWrB0O addresses the least significant byte. 


The transputer has both early and late write cycle modes. For a late write cycle the relevant write strobes 
notMemWrB0-3 are low during T4 and T5; for an early write they are also low during T3. Data should be 
latched into memory on the rising edge of the strobes in both cases, although it is valid until the end of T6. 
If the strobe duration is insufficient, it may be extended at configuration time by adding extra periods Tm to 
either or both of Tstates T4 and T5 for both early and late modes. For an early cycle they may also be added 
to T3. Further extension may be obtained by inserting wait states at the end of T4. If the data hold time is 
insufficient, extra periods Tm may be added to T6 to extend it. 


Table 8.5 Write 


SYMBOL PARAMETER MIN) NOM | MAX UNITS | NOTE 
TdVWrH Data setup before write 

TWrHdX Data hold after write 

TSOLWrL_ | notMemSO before start of early write 


notWemS0O before start of late write 
TSOHWrH | End of write from end of notwemS0 
TWrLWrH =| Early write pulse width 

Late write pulse width 


Notes 
1 Timing is for all write strobes notMemWrBo-3. 
2 a is T6 where T6 can be from one to five periods Tm in length. 
3 b is T2 where T2 can be from one to four periods Tm in length. 
4 c is total of T2+T3 where T2, T3 can be from one to four periods Tm each in length. 


5 d is total of T3+T4+Twalt+T5 where T3, T4, TS can be from one to four periods Tm each in length and Twait 
may be zero or any number of periods Tm in length. 


6 e is total of T44+ Twalt+T5 where T4, T5 can be from one to four periods Tm each in length and Twait may be 
zero or any number of periods Tm in length. 
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Tstae | Ti | T2 | T3 | T4 | TS | Te T1 | 


reconaot fox Oh 


MemnotWrDO 


MemAD2-31 


TaVSOL. TSOLaxX 


SOLWrL TWrLWrH 


notMemWrBO-B3 
(early write) 


notMemWrBO-B3 
(late write) TSOHWrH 
TSOLSOH 


notMemS0 
(CE) 


TSOLS1L @) TSOHS1H @) 
TSOLS1H © 


notMemS1 
(ALE) 


Figure 8.7 IMS T800 external write cycle 


In the write cycle timing diagram ProcClockOut is included as a guide only; it is shown with each Tstate 
configured to one period Tm. The strobe is inactive during internal memory cycles. 


8 External memory interface 
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Figure 8.8 IMS T800 dynamic RAM application 
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8.10 MemConfig 


MemConfig is an input pin used to read configuration data when setting external memory interface (EMI) 
characteristics. It is read by the processor on two occasions after Reset goes low; first to check if one of the 
preset internal configurations is required, then to determine a possible external configuration. 


8.10.1. Internal configuration 


The internal configuration scan comprises 64 periods TDCLDCL of Clockin during the internal scan period 
of 144 Clockin periods. MemnotWrD0, MemnotRfD1 and MemAD2-32 are all high at the beginning of the 
scan. Starting with MemnotWrDO, each of these lines goes low successively at intervals of two Clockin 
periods and stays low until the end of the scan. If one of these lines is connected to MemConfig the preset 
internal configuration mode associated with that line will be used as the EMI configuration. The default 
configuration is that defined in the table for MemAD31; connecting MemConfig to VCC will also produce 
this default configuration. Note that only 17 of the possible configurations are valid, all others remain at the 
default configuration. 


Table 8.6 IMS T800 internal configuration coding 
Duration of each Tstate Strobe Refresh 
periods Tm coefficient interval 
Clockin | Proc 
T1 T2 T3 T4 T5 T6|si s2 s3 s4| type | cycles | cycles 


1 
1 1 1 1 0 3 3 
0 


MemnotWrD0 
MemnotRfD1 
MemAD2 
MemAD3 
MemAD4 
MemAD5 
MemAD6 
MemAD7 
MemAD8 
MemAD9 
MemAD10 
MemAD11 
MemAD12 
MemAD13 
MemAD14 
MemAD15 
MemAD31 


1 3 1 72 
2 3 1 72 


PRWWNNWAN-]-WWON-]-NN—= 
PONN-ON=72N]=-]=-00N 
ZLNWNNAWANWNNNWNN 
DOPWWDHWDOWRWWWON NH 
DD NONARONAWANAWOAUA 


ANMDDMD |$WND H——]~NNDM HANA] SH 
PMYONDNONMANDN NNN =~ A ~~ —~ 
OG — | 2 OAD — at as os ot os ot ot 
Baan annnasttstaas 


¢ Provided for static RAM only. 
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Tstate|1|2]3]4/5|6]1)2/3]4/5/6]1/2 
notMemso | = [ | f[ L 
notMemS1 | 30 [| [ 
notMemS2 |) an at ee i 
notMemS3 ie” ee 
notMemS4_:___5 
notMemRd sae Ill nameeen lll ieee 
notMemwWr late | [ . . £| [| 


MemConfig=MemnotWrDO 


Tstate|1,1|2;2)2|3]4]/5,5|6,6,6|1)2 
notMemSO | [| — 
notMemS1 | 30 | 
notMemS2 et | | 
notMemS3 ee 
notMemS4__: 8... 
notMemRd sa a a 
notMemWr late | [ — 


MemConfig=MemAD3 


Tstate|1|2,;2]3|4|5|6,6|]1/2,2/3/4|5 
notMemSo | | | 
notMemS1 | 30 [| 
notMemS2 | ens en eee 
notMemS3 a ee 
notMemS4_: 7... ..... 
notMemRd | | | 
notMemWr late | ff | __ 


MemConfig=MemnotRID1 


Tstate|1)1[2;2/3,3|4]5,5,5|/6,6| 1,1 
notMemSO | =f[ | 
notMemSi =| 7 OUTti—~™t 
notMemS2 a es 
notMemS3 1 
notMemS4 | | 
notMemRd + — 
notMemWr cary | [| 


MemConfig=MemAD7 
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Figure 8.9 IMS T800 internal configuration 


8.10.2 External configuration 

If MemConfig is held low until MemnotWrDO goes low the internal configuration is ignored and an external 
configuration will be loaded instead. An external configuration scan always follows an internal one, but if an 
internal configuration occurs any external configuration is ignored. 


The external configuration scan comprises 36 successive external read cycles, using the default EMI con- 
figuration preset by MemAD31. However, instead of data being read on the data bus as for a normal read 
cycle, only a single bit of data is read on MemConfig at each cycle. Addresses put out on the bus for each 
read cycle are shown in table 8.7, and are designed to address ROM at the top of the memory map. The 
table shows the data to be held in ROM; data required at the MemConfig pin is the inverse of this. 


MemConfig is typically connected via an inverter to MemnotWrDO. Data bit zero of the least significant byte 
of each ROM word then provides the configuration data stream. By switching MemConfig between various 
data bus lines up to 32 configurations can be stored in ROM, one per bit of the data bus. MemConfig can be” 
permanently connected to a data line or to GND. Connecting MemConfig to GND gives all Tstates configured 
to four periods; notMemS1 pulse of maximum duration; notMemS2-4 delayed by maximum; refresh interval 
72 periods of Clockin; refresh enabled; late write. 


The external memory configuration table 8.7 shows the contribution of each memory address to the 13 con- 
figuration fields. The lowest 12 words (#7FFFFF6C to #7FFFFF98, fields 1 to 6) define the number of extra 
periods Tm to be added to each Tstate. If field 2 is 3 then three extra periods will be added to T2 to extend 
it to the maximum of four periods. 
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Internal configuration External configuration 


64 periods lass seddbciacieal 16 periods} Read at . Read at . 


of Clockin |9.9:9: 1g] Of Clockin |7FFFFF6C:7FFFFF70: 


MemnotWrD0O 
MemnotRfD1 
MemAD2 LLLLLA WLLL L2 


MemAD3 ; WLLL ae 
v | 3 
MemAD31 LLLLLA LLLLA 


MemConfig (1) WLLLLA._WLLLL2 
MemConfig @) 3 


® Internal configuration: MemConfig connected to MemAD2 
External configuration: MemConfig connected to inverse of MemAD3 


Figure 8.10 IMS T800 internal configuration scan 


The next five addresses (field 7) define the duration of notMemS1 and the following fifteen (fields 8 to 10) 
define the delays before strobes notMemS2-4 become active. The five bits allocated to each strobe allow 
durations of from 0 to 31 periods Tm, as described in strobes page 33. 


Addresses #7FFFFFEC to #7FFFFFF4 (fields 11 and 12) define the refresh interval and whether refresh is to 
be used, whilst the final address (field 13) supplies a high bit to MemConfig if a late write cycle is required. 


The columns to the right of the coding table show the values of each configuration bit for the four sample 
external configuration diagrams. Note the inclusion of period E at the end of T6 in some diagrams. This is 
inserted to bring the start of the next Tstate T1 to coincide with a rising edge of ProcClockOut (page 31). 


Wait states W have been added to show the effect of them on strobe timing; they are not part of a configuration. 
In each case which includes wait states, two wait periods are defined. This shows that if a wait state would 
cause the start of T5 to coincide with a falling edge of ProcClockOut, another period Tm is generated by 
the EMI to force it to coincide with a rising edge of ProcClockOut. This coincidence is only necessary if wait 
states are added, otherwise coincidence with a falling edge is permitted. Any configuration memory access 
is only permitted to be extended using wait, up to a total of 14 Clockin periods. 
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Tstate|1]2,;2|3,3]4|5]6,6,E|1/2,2/3 
notMemSO | FCO 
notMemS1 | 8 | | 
notMemS2 oe a ee ae 
notMemS3 2) es cen ie 
notMemS4 “t 4 ~Ly 
notMemRd  #£#| [— 
notMemWr early| === [ |. 


MemWait (0) 
MemWait (0) 


Example 1 


Tstate|1|2|3,3]4|Www[5|6,6,E| 1/2 
notMemSO | | | 
notMemS1 | 1 | 


notMemS2 


notMemRd amu ENEMIES geccea 
notMemWr late | jf — 
MemWait (2) | | 
MemWait (3) | | 


Example 3 


(0) No wait states inserted 
(1) One wait state inserted 
(2) Two wait states inserted 
(3) Three wait states inserted 


Tstate|1]2|3,3]4|wwiwl5|6]1/2/3,3 
notMemS0 | | | 
notMemS1 
notMemS2 ES 
notMemS3 ge 
notMemS4 a — 6s ne 
notMemRd £#| f[— 
notMemWr late | | [  — 

MemWait (2) | ee 
MemWait (3) | | | 


Example 2 


Tstate|1|2,;2|3,;3|4|ww[5]6,6,E|1/2 
notMemSO | CUECidC 
notMemS1 lif ti (ts—sSSCSL 
notMemS2 7 a ao | cn 
notMemS3 eae el |e Cr 
notMemS4 2 on as 
notMemRd | #J[ | 
notMemWr early|  =[ — -earlyl— CUE” 

MemWait (1) [ 
MemWait G) | | 


Example 4 


Figure 8.11 IMS T800 external configuration 


Internal configuration External configuration 


Address 


“}-7FFFFF6C 
--|-7FFFFF70 

7FFFFF74 
“|-7FFFFF78 
--|-7FFFFFFO 


MemnotWwrD0 ! 
MemnotRfD1 ; W/1, 
MemAD2 | W/2 } 
— ‘ZZ | 


fv | 
MemAD31 | V3 : 
MemConftig@ _L: 
notMemRd 


@® 


(1) MemConfig connected to inverse of MemnotWrDO 

(2) Configuration field 1; T1 configured for 2 periods Tm 

(3) Configuration field 2; T2 configured for 3 periods Tm 

(4) Configuration field 10; most significant bit of notMemS4 configured high 
(5) Configuration field 11; refresh interval configured for 36 periods Clockin 
(6) Configuration field 12; refresh enabled 

(7) Configuration field 13; early write cycle 


Figure 8.12 IMS T800 external configuration scan 


“|-7FFFFFF4 


--|-7FFFFFF8 


8 External memory interface 


Table 8.7 IMS T800 external configuration coding 


MemAD | Example diagram 
address 


7FFFFF6C 1 T1 least significant bit 
7FFFFF70 T1 most significant bit 
7FFFFF74 T2 least significant bit 
7FFFFF78 T2 most significant bit 
7FFFFF7C T3 least significant bit 
7FFFFF80 T3 most significant bit 
7FFFFF84 T4 least significant bit 
7FFFFF88 T4 most significant bit 
7FFFFF8C T5 least significant bit 
7FFFFF90 T5 most significant bit 
7FFFFF94 T6 least significant bit 
7FFFFF98 T6 most significant bit 
13 7FFFFF9C notMemS1 least significant bit - 
14 7FFFFFAO 
4 SY 


15 7FFFFFA4 
notMemS1 most significant bit 


OON OTR AND = 


16 | 7FFFFFA8 
17 | 7FFFFFAC 


18 7FFFFFBO notMemS2 least significant bit 
19 7FFFFFB4 

20 7FFFFFB8 il 4 

21 7FFFFFBC 


7FFFFFCO 


notMemS2 most significant bit 


23 7FFFFFC4 notMemS3 least significant bit 
24 7FFFFFC8 

25 | 7FFFFFCC y y 

26 7FFFFFDO 


ODOODOIDADOADAOINNNNNIOOAOIAAROANND — 
a oOoooo|oooooo0oo-ce000 
200+0/0+00-|00000 ooo°o-"|o-$O0000+ Oo000 


7FFFFFD4 
7FFFFFD8 


notMemS3 most significant bit 
notMemS4 least significant bit 


oOoo]"00 coc0+|oc0++\o+000 o-O0O000$"0+-"900°0 


29 7FFFFFDC | 10 

30 7FFFFFEO 10 ul } 

31 7FFFFFE4 10 

32 7FFFFFE8 10 | notMemS4 most significant bit 


7FFFFFEGC 
7FFFFFFO 
7FFFFFF4 
7FFFFFF8 


Refresh Interval least significant bit 
Refresh Interval most significant bit 
Refresh Enable 

Late Write 


w—e tbl 
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Table 8.8 IMS T800 memory refresh configuration coding 


Refresh Interval Field 11 Complete 
interval in us encoding cycle (mS) 
18 00 : 
36 01 


Refresh intervals are in periods of Clockin and Clockin frequency is 5 MHz: 
Interval = 18 « 200 = 3600 ns 


Refresh interval is between successive incremental refresh addresses. 
Complete cycles are shown for 256 row DRAMS. 


Table 8.9 Memory configuration 


SYMBOL PARAMETER | MIN | 

TMCVRdH | Memory configuration data setup 30 

TRdHMCX | Memory configuration data hold 0 

TSOLRdH | notMemS0O to configuration data read a 
Notes 


1 ais 16 periods Tm. 


Tstate T1 T3 T4 T5 
Tm 


MemnotWrD0 CKKKEKKKKKKKK Data DY 
MemnotRfD1 EKER KEKKKKK_ Data DD) 
MemAD2-31 £KKKKKKKKXKKKKGRETED > 


notMemS0 


TSOLSOH 


notMemRd 


Figure 8.13 IMS T800 external configuration read cycle timing 
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8.11 notMemRf 


The IMS T800 can be operated with memory refresh enabled or disabled. The selection is made during 
memory configuration, when the refresh interval is also determined. Refresh cycles do not interrupt internal 
memory accesses, although the internal addresses cannot be reflected on the external bus during refresh. 


When refresh is disabled no refresh cycles occur. During the post-Reset period eight dummy refresh cycles 
will occur with the appropriate timing but with no bus or strobe activity. 


A refresh cycle uses the same basic external memory timing as a normal external memory cycle, except that 
it starts two periods Tm before the start of T1. If a refresh cycle is due during an external memory access, 
it will be delayed until the end of that external cycle. Two extra periods Tm (periods R in the diagram) will 
then be inserted between the end of T6 of the external memory cycle and the start of T1 of the refresh cycle 
itself. The refresh address and various external strobes become active approximately one period Tm before 
T1. Bus signals are active until the end of T2, whilst notMemRf remains active until the end of T6. 


For a refresh cycle, MemnotRfD1 goes low before notMemRf goes low and MemnotWrD0O goes high with 
the same timing as MemnotRfD1. All the address lines share the same timing, but only MemAD2-11 give 
the refresh address. MemAD12-30 stay high during the address period, whilst MemAD31 remains low. 
Refresh cycles generate strobes notWemS0-4 with timing as for a normal external cycle, but notMemRd and 
notMemWrB0-3 remain high. MemWait operates normally during refresh cycles. 


Table 8.10 Memory refresh 


|SYMBOL | PARAMETER | SMIN: | NOM | MAX | UNITS | NOTE | 


TRfLRfH | Refresh pulse width low a+6 1 
TRaVSOL | Refresh address setup before notWemSO 2 
TRfLSOL | Refresh indicator setup before notMemSO 2 


Notes 


1 ais total Tmx+Tm. 


2 b is total T1+Tm where T1 can be from one to four periods Tm in length. 


Tstae | T6 | R | R | T1 | T2 | 73 | T4 | TS | Te | TH | 
MomaDeo1 XX Address Data 
MemAD2-11 <> 
notMemSO ) 


TRALSOL TRILRFH 


notMemRf 


MemnotWrD0O 


MemnotRfD1 


MemAD12-—30 


MemAD31 


Figure 8.14 IMS T800 refresh cycle timing 


8.12 MemWait 


Taking MemWait high with the timing shown will extend the duration of T4. MemWait is sampled relative 
to the falling edge of ProcClockOut during a T3 period, and should not change state in this region. By 
convention, notMemS4 is used to synchronize wait state insertion. If this or another strobe is used, its delay 
should be such as to take the strobe low an even number of periods Tm after the start of T1, to coincide with 
a rising edge of ProcClockOut. 


MemWait may be kept high indefinitely, although if dynamic memory refresh is used it should not be kept 
high long enough to interfere with refresh timing. MemWait operates normally during all cycles, including 
refresh and configuration cycles. It does not affect internal memory access in any way. 


If the start of T5 would coincide with a falling edge of ProcClockOut an extra wait period Tm (EW) is 
generated by the EMI to force coincidence with a rising edge. Rising edge coincidence is only forced if wait 
States are added, otherwise coincidence with a falling edge is permitted. 
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Table 8.11 Memory wait 


PARAMETER | MIN | NOM | 


| MAX | 
TPCHWIH | Wait setup 0.5Tm+3 
TPCHWIL | Wait hold 0.5Tm+3 
TWtLWtH | Delay before re-assertion of Wait 2Tm 


Notes 


UNITS | NOTE 


1,2 
1,2 


1 ProcClockOut load should not exceed 50pf. 


2 If wait period exceeds refresh interval, refresh cycles will be lost. 


Tstate | T2 


ProcClockOut | 
Mem Wait a WLLILLLLLLLLLLLLL LLL LL 


MemADO-31 Address_><<KKKKKKKKKKEKEKKKS. Data >< _ Adress 


notMemRd \ / 


Tstate | 13 
ProcClockOut 


Mem Wait 


Tstate | 13 
ProcClockOut 


MemWait 


Figure 8.15 IMS T800 memory wait timing 


50 


8.13 MemReq, MemGranted 


Direct memory access (DMA) can be requested at any time by taking the asynchronous MemReq input high. 
The transputer samples MemReq during the final period Tm of T6 of both refresh and external memory 
cycles. To guarantee taking over the bus immediately following either, MemReq must be set up at least two 
periods Tm before the end of T6. in the absence of an external memory cycle, MemReq is sampled during 
every low period of ProcClockOut. The address bus is tristated two periods Tm after the ProcClockOut 
rising edge which follows the sample. MemGranted is asserted one period Tm after that. 


Removal of MemReq is sampled during each low period of ProcClockOut and MemGranted is removed 
synchronously with the next falling edge of ProcClockOut. If accurate timing of DMA is required, MemReq 
should be set low coincident with a falling edge of ProcClockOut. Further external bus activity, either refresh, 
external cycles or reflection of internal cycles, will commence at the next rising edge of ProcClockOut. 


Strobes are left in their inactive states during DMA. DMA cannot interrupt a refresh or external memory cycle, 
and outstanding refresh cycles will occur before the bus is released to DMA. DMA does not interfere with 
internal memory cycles in any way, although a program running in internal memory would have to wait for 
the end of DMA before accessing external memory. DMA cannot access internal memory. If DMA extends 
longer than one refresh interval (Memory Refresh Configuration Coding, table 8.8), the DMA user becomes 
responsible for refresh. DMA may also inhibit an internally running program from accessing external memory. 


DMA allows a bootstrap program to be loaded into external RAM ready for execution after reset. If MemReq is 
held high throughout reset, MemGranted will be asserted before the bootstrap sequence begins. MemReq 
must be high at least one period TDCLDCL of Clockin before Reset. The circuit should be designed to 
ensure correct operation if Reset could interrupt a normal DMA cycle. 


Table 8.12 Memory request 


SYMBOL PARAMETER 


TMRHMGH | Memory request response time 
TMRLMGL | Memory request end response time 
TADZMGH | Bus tristate before memory granted 
TMGLADV | Bus active after end of memory granted 


Notes 


MIN | NOM | MAX UNITS 


1 These values assume no external memory cycle is in progress. If an external cycle is active, maximum time 
could be (1 EMI cycle Tmx)+(1 refresh cycle TRILRfIH)+(6 periods Tm). 


ProcClockOut 


TMRLMGL 


MemGranted 


MemnotWrD0O TADZMGH 
MemnotRfD1 » 
MemAD2-—31 . 


Figure 8.16 IMS T800 memory request timing 
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MemReq 


MemGranted 


Reset 
Configuration 
sequence 


D Pre- and post-configuration delays (figure 6.3) 
| Internal configuration sequence 

E External configuration sequence 

R Initial refresh sequence 

B Bootstrap sequence 


Figure 8.17 IMS T800 DMA sequence at reset 


MemReq = = / ON 
eras ae |_Refresh__| 
Interface cycles Read or Write Refresh Read or Write 
MemGranted ee tCt—<(<is*s Ne 
MemnotRDt Ny 
MemAD2-31 
MemAD2-31 


Figure 8.18 IMS T800 operation of MemReq, MemGranted with external, refresh memory cycles 


MemReq JJLLLLLLLLL \ WY \ 


Internal Memory Cycles 
TilT2|T3(T4/T5|T6 TifTalT3jT4[T5|T6 
External Memory 


Interface activity EMI cycle EMI cycle ae 


MemGranted / \ / \ 
MemnotWrD0 


MemAD2-31 


Figure 8.19 IMS T800 operation of MemReq, MemGranted with external, internal memory cycles 
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9 Events 


EventReq and EventAck provide an asynchronous handshake interface between an external event and an 
internal process. When an external event takes EventReq high the external event channel (additional to the 
external link channels) is made ready to communicate with a process. When both the event channel and the 
process are ready the processor takes EventAck high and the process, if waiting, is scheduled. EventAck 
is removed after EventReq goes low. 


Only one process may use the event channel at any given time. If no process requires an event to occur 
EventAck will never be taken high. Although EventReq triggers the channel on a transition from low to high, 
it must not be removed before EventAck is high. EventReq should be low during Reset; if not it will be 
ignored until it has gone low and returned high. EventAck is taken low when Reset occurs. 


If the process is a high priority one and no other high priority process is running, the latency is as described 
on page 10. Setting a high priority task to wait for an event input is a way of interrupting a transputer program. 


Table 9.1 Event 
[|SYMBOL [|  _—s—sPARAMETER——————Cis|:s«SOMIN:_'| NOM | MAX | UNITS | NOTE | 


TVHKH Event request response cig 
1 


TKHVL Event request hold 
TVLKL Delay before removal of event acknowledge 
TKLVH Delay before re-assertion of event request 
TKHEWL | Event acknowledge to end of event waiting 
TKLEWH | End of event acknowledge to event waiting 


Notes 


oooo0o°o 


1 ais 3 processor cycles TPCLPCL. 


EventReq 
TVHKH 


EventAck 


Figure 9.1 IMS T800 event timing 
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10 Links 


Four identical INMOS bi-directional serial links provide synchronized communication between processors 
and with the outside world. Each link comprises an input channel and output channel. A link between two 
transputers is implemented by connecting a link interface on one transputer to a link interface on the other 
transputer. Every byte of data sent on a link is acknowledged on the input of the same link, thus each signal 
line carries both data and control information. 


The quiescent state of a link output is low. Each data byte is transmitted as a high start bit followed by a one 
bit followed by eight data bits followed by a low stop bit. The least significant bit of data is transmitted first. 
After transmitting a data byte the sender waits for the acknowledge, which consists of a high start bit followed 
by a zero bit. The acknowledge signifies both that a process was able to receive the acknowledged data byte 
and that the receiving link is able to receive another byte. The sending link reschedules the sending process 
only after the acknowledge for the final byte of the message has been received. 


The IMS T800 links allow an acknowledge packet to be sent before the data packet has been fully received. 
This overlapped acknowledge technique is fully compatible with all other INMOS transputer links. 


The IMS T800 links support the standard INMVOS communication speed of 10 Mbits/sec. In addition they can 
be used at 5 or 20 Mbits/sec. Links are not synchronised with Clockin or ProcClockOut and are insensitive 
to their phases. Thus links from independently clocked systems may communicate, providing only that the 
clocks are nominally identical and within specification. 


Links are TTL compatible and intended to be used in electrically quiet environments, between devices on a 
single printed circuit board or between two boards via a backplane. Direct connection may be made between 
devices separated by a distance of less than 300 millimetres. For longer distances a matched 100 Ohm 
transmission line should be used with series matching resistors RM. When this is done the line delay should 
be less than 0.4 bit time to ensure that the reflection returns before the next data bit is sent. 


Buffers may be used for very long transmissions. If so, their overall propagation delay should be stable within 
the skew tolerance of the link, although the absolute value of the delay is immaterial. 


Link speeds can be set by LinkSpecial, LinkOSpecial and Link123Special. The link 0 speed can be set 
independently. Table 10.1 shows uni-directional and bi-directional data rates in Kbytes/sec for each link 
speed; LinknSpecial is to be read as LinkOSpecial when selecting link 0 speed and as Link123Special for 


the others. Data rates are quoted for a transputer using internal memory, and will be affected by a factor 
depending on the number of external memory accesses and the length of the external memory cycle. 


Table 10.1 Speed Settings for Transputer Links 


Link Linkn Kbytes/sec 
Specie Special | Mbits/sec| Uni | Bi | 


; 
1 
1 


HH/0}4]2/3/4/5]6/7|L PHL, 
| 


Data | | Ack | 


Figure 10.1 IMS T800 link data and acknowledge packets 
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Table 10.2 Link 


-SYMBOL | ______PARAMETER __|_MIN | NOM | MAX | UNITS | NOTE 
TJQr LinkOut rise time 
TJQf LinkOut fall time 
TJDr Linkin rise time 
TJDf Linkin fall time 
TJQJD Buffered edge delay 


TJBskew | Variation in TJQJD 20 Mbits/s 
10 Mbits/s 
5 Mbits/s 
CLIZ Linkin capacitance @ f=1MHz 
CLL LinkOut load capacitance 
RM Series resistor for 1000 transmission line 


Notes 


1 This is the variation in the total delay through buffers, transmission lines, differential receivers etc., caused by 
such things as short term variation in supply voltages and differences in delays for rising and falling edges. 


Figure 10.2 IMS T800 link timing 


LinkOut 1.5v—- — — 


Latest TJQJD 
Earliest TJQJD 


Linkin 1.5v : 


Figure 10.3 IMS T800 buffered link timing 4 
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Transputer family device A 
LinkOut 


Linkin 


LinkOut 
Transputer family device B 


Linkin 


Figure 10.4 IMS T800 Links directly connected 


Transputer family device A Zo=1000hms 
LinkOut yp Linkin 


Linkin |{_ LinkOut 
Transputer family device B 


Figure 10.5 IMS T800 Links connected by transmission line 


Transputer family device A 


LinkOut e: Linkin 
Linkin <i LinkOut 


Transputer family device B 


Figure 10.6 IMS T800 Links connected by buffers 
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11 
11.1 


Electrical specifications 


DC electrical characteristics 


Table 11.1 Absolute maximum ratings 


SYMBOL PARAMETER | MIN | MAX | UNITS 


DC supply voltage 
Voltage on input and output pins 
Input current 


Output short circuit time (one pin) Ss 
Storage temperature °C 
Ambient temperature under bias °C 
Maximum allowable dissipation WwW 


1 All voltages are with respect to GND. 


2 This is a stress rating only and functional operation of the device at these or any other conditions beyond those 
indicated in the operating sections of this specification is not implied. Stresses greater than those listed may 
cause permanent damage to the device. Exposure to absolute maximum rating conditions for extended periods 
may affect reliability. 


3 This device contains circuitry to protect the inputs against damage caused by high static voltages or electrical 
fields. However, it is advised that normal precautions be taken to avoid application of any voltage higher than the 
absolute maximum rated voltages to this high impedance circuit. Unused inputs should be tied to an appropriate 
logic level such as VCC or GND. 


4 The input current applies to any input or output pin and applies when the voltage on the pin is between GND 
and VCC. 


Table 11.2 Operating conditions 


SYMBOL PARAMETER | MAX | UNITS | NOTE 


DC supply voltage 
Input or output voltage 


Load capacitance on any pin 
Operating temperature range IMS T800-S 
Operating temperature range IMS T800-M 


Notes 


1 All voltages are with respect to GND. 
2 Excursions beyond the supplies are permitted but not recommended; see DC characteristics. 


3 Air flow rate 400 linear ft/min transverse air flow. 


11 Electrical specifications o7 


Table 11.3 DC characteristics 


SYMBOL PARAMETER [MIN | MAX | ait NOTE 


High level input voltage 

Low level input voltage 

Input current @ GND<VI<VCC 
Output high voltage @ IOH=2mA 
Output low voltage @ IOL=4mA 
Output short circuit current @ GND<VO<VCC 


~~ w = - = 


—t ok A 


Tristate output current @ GND<VO<VCC 
Power dissipation 

Input capacitance @ f=1MHz 

Output capacitance @ f=1MHz 


Oto AW A Wf ph fo fo fo 


o> o>) 


1 All voltages are with respect to GND. 

2 Parameters for IMS T800-S measured at 4.75V<VCC<5.25V and 0°C<TA<70°C. 
Parameters for IMS T800-M measured at 4.75V<VCC<5.25V and -55°C<TA<125°C. 
Input clock frequency = 5MHz. 

3 Current sourced from non-link outputs. 

4 Current sourced from link outputs. 


5 Power dissipation varies with output loading and program execution. 
Power dissipation for processor operating at 2OMHz. 


6 This parameter is sampled and not 100% tested. 
7 Parameter for IMS T800-S. 
8 Parameter for IMS T800-M. 


11.2 Equivalent circuits 


Load for: |_—R1_|__R2__| Equivalent load: 


Link outputs 1K96 47K |1 Schottky TTL input 
Other outputs | 970R 24K | 2 Schottky TTL inputs 


Diodes are 1N916 


Figure 11.1 Load circuit for AC measurements 


Test point 

Output under test 
50pF 

GND 


Figure 11.2 Tristate load circuit for AC measurements 


11.3 AC timing characteristics 


Table 11.4 Input, output edges 


_SYMBOL | _____PARAMETER | MIN, | MAX | UNITS | NOTE 
TDr Input rising edges 
TDf Input falling edges 

TQr Output rising edges 
TOf Output falling edges 

TSOLaHZ | Address high to tristate 

TSOLaLZ | Address low to tristate 


Notes 


1 Non-link pins; see section on links. 
2 All inputs except Clockin; see section on Clockin. 


3 ais T2 where T2 can be from one to four periods Tm in length. 
Address lines include MemnotWrD0, MemnotRfD1, MemAD2-31. 


Figure 11.4 IMS T800 tristate timing relative to notMemSO 
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30 Rise time 
Time 
Rise time ns 


Fall time 
Fall time 


40 60 80 100 40 60 80 100 


Load Capacitance pF Load Capacitance pF 
Link EMI 


Notes Figure 11.5 Typical rise/fall times 


1 Skew is measured between notMemS0 with a standard load (2 Schottky TTL inputs and 30pF) and 
notMemS0 with a load of 2 Schottky TTL inputs and varying capacitance. 


11.4 Power rating 


Internal power dissipation Pry of transputer and peripheral chips depends on VCC, as shown in figure 11.6. 
Pryr is substantially independent of temperature. 


Total power dissipation Pp of the chip is 
Pp = Prnr + Pro 
where Pro is the power dissipation in the input and output pins; this is application dependent. 
Internal working temperature 7, of the chip is 
Ty =T, +6J4 * Pp 


where 7, is the external ambient temperature in °C and @J, is the junction-to-ambient thermal resistance in 
°C/W. 6J,4 for each package is given in the Packaging Specifications section. 


800 7800-25 


T800-20 
700 
bower T800-17 
PINT 600 
mW 
500 


4.4 46 48 50 5.2 5.4 5.6 
VCC Volts 


Figure 11.6 IMS T800 internal power dissipation vs VCC 


650 
Power 600 


PD 
mW 550 


500 


25 30 
Processor frequency MHz 


Figure 11.7 IMS T800 typical power dissipation with processor speed 
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12 Performance 


The performance of the transputer is measured in terms of the number of bytes required for the program, and 
the number of (internal) processor cycles required to execute the program. The figures here relate to occam 
programs. For the same function, other languages should achieve approximately the same performance as 
occam. 


With transputers incorporating an FPU, this type of performance calculation is straight forward when consider- 
ing only integer data types. However, when floating point calculations using the REAL32 and REAL64 data 
types are present in the program, complications arise due to the concurrency inherent in the transputer’s de- 
sign whereby integer calculations can be overlapped with floating point calculations. A more comprehensive 
guide to the impact of this concurrency on transputer performance can be found in The Transputer Instruction 
Set - A Compiler Writers’ Guide. 


12.1 Performance overview 


These figures are averages obtained from detailed simulation, and should be used only as an initial guide; 
they assume operands are of type INT. The abbreviations in table 12.1 are used to represent the quantities 
indicated. In the replicator section of the table, figures in braces {} are not necessary if the number of 
replications is a compile time constant. To estimate performance, add together the time for the variable 
references and the time for the operation. 


Table 12.1 Key to performance table 


number of component processes 

number of processes earlier in queue 

1 if INT parameter or array parameter, 0 if not 
number of table entries (table size) 

width of constant in nibbles 

number of places to shift 

expression used in a guard 


timer expression used in a guard 

most significant bit set of multiplier ((-1) if the multiplier is 0) 

most significant bit set in a positive multiplier when counting from zero ((-1) if the multiplier is 0) 
most significant bit set in the two’s complement of a negative multiplier 

Number of scalar parameters in a procedure 

Number of array parameters in a procedure 
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Table 12.2 Performance 


Ct Size bytes) | Time (cycles) 


Names 


variables 
in expression 2.1+2(r) 
assigned to or input to 1.1+(r) 


in PROC or FUNCTION call, 
corresponding to an INT parameter 
channels 


Array Variables (for single dimension arrays) 
constant subscript 
variable subscript 
expression subscript 


Declarations 
CHAN OF protocol 
[size]CHAN OF protocol 
PROC 
Primitives 
assignment 
input 
output 
STOP 
SKIP 


Arithmetic operators 


1.1+(r) 
2.1 


0 
7.3 
7.3 


3.1 
2.2 + 20.2+size 


REM 
>> << 


Modulo Arithmetic operators 
PLUS 
MINUS 
TIMES (fast multiply, positive operand) 
TIMES (fast multiply, negative operand) 


Boolean operators 
OR 
AND NOT 


Comparison operators 
= constant 

= variable 

<> constant 

<> variable 


<= 
Bit operators 


a oe 


Expressions 
constant in expression 
check if error 


12 Performance 63 


Table 12.3 Performance 
Size (bytes) Time (cycles) 
Timers 


timer input 3 
timer AFTER 
if past time 4 
with empty timer queue 31 
non-empty timer queue 38+ne+9 
ALT (timer) 
with empty timer queue 52 
non-empty timer queue 59+ne«9 
timer alt guard 8+2Eg+2Et 344+2Eg+2Et 


Constructs 
SEQ 0 
IF , 1.4 
if guard 4.3 
ALT (non timer) 26 
alt channel guard 10.24+2Eg 20+2Eg 
skip alt guard 8+2Eg 10+2Eg 
PAR 11.54(np-1)+#7.5 19.5+(np-1)+#30.5 
WHILE 4 12 


Procedure or function call 


3.5+(NSp-2)+1.1 16.5+(nsp-2)+#1.1 
+Nap+2.3 +Map+2.3 


Replicators 
replicated SEQ 7.3{+5.1} (-3.8)+15.1 #count{+7.1} 
replicated IF 12.3{+5.1} (-2.6)+19.4#count{+7.1} 
replicated ALT 24.8{+10.2} 25.4+33.4«count{+14.2} 
replicated timer ALT 24.8{+10.2} 62.4433.4«count{+14.2} 
replicated PAR : (-6.4)+70.9«count{+7.1 


12.2 Fast multiply, TIMES 


The IMS T800 has a fast integer multiplication instruction product. For a positive multiplier its execution time 
is 4+Tbp cycles, and for a negative multiplier 5+Tbe cycles (table 12.1). The time taken for a multiplication 
by zero is 3 cycles. 


Implementations of high level languages on the transputer may take advantage of this instruction. For example, 
the OCCaM modulo arithmetic operator TIMES is implemented by the instruction and the right-hand operand is 
treated as the multiplier. The fast multiplication instruction is also used in high level language implementations 
for the multiplication implicit in multi-dimensional array access. 
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12.3 Arithmetic 


A set of functions are provided within the development system to support the efficient implementation of 
multiple length integer arithmetic. In the IMS T800, floating point arithmetic is taken care of by the FPU. In 
table 12.4 n gives the number of places shifted and all arguments and results are assumed to be local. Full 
details of these functions are provided in the OCCam reference manual, supplied as part of the development 
system and available as a separate publication. 


When calculating the execution time of the predefined maths functions, no time needs to be added for calling 
overhead. These functions are compiled directly into special purpose instructions which are designed to 
support the efficient implementation of multiple length integer arithmetic and floating point arithmetic. 


Table 12.4 Arithmetic performance 


re a 
parameter access {+ 
LONGADD 

LONGSUM 

LONGSUB 

LONGDIFF 

LONGPROD 

LONGDIV 
SHIFTRIGHT (n<32) 


on 


(n>=32) n-27 
SHIFTLEFT (n<32) 4+n 
(n>=32) n-27 
NORMALISE (n<32) n+6 
(n>=32) n-25 
(n=64) 4 
ASHIFTRIGHT SHIFTRIGHT+2 
ASHIFTLEFT SHIFTLEFT+4 
ROTATERIGHT SHIFTRIGHT 
ROTATELEFT SHIFTLEFT 
FRACMUL LONGPROD+4 


t+ Assuming local variables. 
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12.4 IMS T800 floating point operations 


All references to REAL32 or REAL64 operands within programs compiled for the IMS T800 normally produce 
the following performance figures. 


Table 12.5 Floating point performance 


Names 
variables 
in expression 
assigned to or input to 
in PROC or FUNCTION call, 
corresponding to a REAL 
parameter 


Arithmetic operators 


<= 


Conversions 
REAL32 to - 
REAL64 to - 

To INT32 from - 
To INT64 from - 
INT32 to - 
INT64 to - 


12.4.1. IMS T800 floating point functions 
These functions are provided by the development system. They are compiled directly into special purpose 


instructions designed to support the efficient implementation of some of the common mathematical functions 
of other languages. The functions provide ABS and SQRT for both REAL32 and REAL64 operand types. 


Table 12.6 IMS T800 floating point arithmetic performance 


+ Assuming local variables. 
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12.4.2 IMS T800 special purpose functions and procedures 


The functions and procedures given in tables 12.8 and 12.9 are provided by the development system to give 
access to the special instructions available on the IMS T800. Table 12.7 shows the key to the table. 


Table 12.7 Key to special performance table 


most significant bit set in the word counting from zero 
number of words per row (consecutive memory locations) 
number of rows in the two dimensional move 

number of bits to reverse 


Table 12.8 Special purpose functions performance 


ee arr ae 
parameter access + 
BITCOUNT 
CRCBYTE 
CRCWORD 
BITREVNBIT 


BITREVWORD 


t Assuming local variables. 
Table 12.9 Special purpose procedures performance 
+ cycles for 
Procedure ill access { 


8+(2n+23)sr 
8+(2n+23)«r 


CLIP2D 


¢t Assuming local variables. 


12.5 Effect of external memory 


Extra processor cycles may be needed when program and/or data are held in external memory, depending 
both on the operation being performed, and on the speed of the external memory. After a processor cycle 
which initiates a write to memory, the processor continues execution at full speed until at least the next 
memory access. 


Whilst a reasonable estimate may be made of the effect of external memory, the actual performance will 
depend upon the exact nature of the given sequence of operations. 


External memory is characterized by the number of extra processor cycles per external memory cycle, denoted 
as e. For the IMS T800, with the fastest external memory the value of e is 2; a typical value for a large external 
memory is 5. 


If a program is stored in external memory, and e has the value 2 or 3, then no extra cycles need be estimated 
for linear code sequences. For larger values of e, the number of extra cycles required for linear code 
sequences may be estimated at (e-3)/4. A transfer of control may be estimated as requiring e+3 cycles. 


These estimates may be refined for various constructs. In table 12.10 n denotes the number of components 
in a construct. In the case of IF, the n’th conditional is the first to evaluate to TRUE, and the costs include the 
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costs of the conditionals tested. The number of bytes in an array assignment or communication is denoted 
by b. 


Table 12.10 External memory performance 


IMS T800 
Boolean expressions e-2 0 
IF 3en-8 en 
Replicated IF (6e-4)n+7 (5e-2)n+8 
Replicated SEQ (3e-3)n+2 (4e-2)n 


PAR (3e-1)n+8 3en+4 
Replicated PAR (10e-8)n+8 16en-12 
ALT (2e-4)n+6e (2e-2)n+10e-8 
Array assignment and 1) max (2e, e(b/2)) 
communication in 
one transputer 


The following simulation results illustrate the effect of Storing program and/or data in external memory. The 
results are normalized to 1 for both program and data on chip. The first program (Sieve of Erastosthenes) 
is an extreme case as it is dominated by small, data access intensive loops; it contains no concurrency, 


communication, or even multiplication or division. The second program is the pipeline algorithm for Newton 
Raphson square root computation. 


Table 12.11 IMS T800 external memory performance 


| Program | e=2 | e=3 | e=4 | e=5 | 
17 [19 


1.5 


| On chip_ 


Program off chip 


Data off chip 


Program and data off chip 


12.6 Interrupt latency 


If the process is a high priority one and no other high priority process is running, the latency is as described 
in table 12.12. The timings given are in full processor cycles TPCLPCL; the number of Tm states is also 
given where relevant. Maximum latency assumes all memory accesses are internal ones. 


Table 12.12 Interrupt latency | 


IMS T800 with FPU in use 19 38 78 156 
IMS T800 with FPU not in use 19 38 58 116 
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13 Package specifications 


13.1 84 pin grid array package 
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Figure 13.1 IMS T800 84 pin grid array package pinout 
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Figure 13.2 84 pin grid array package dimensions 


Table 13.1 84 pin grid array package dimensions 


| __—Millimetres_ | sinches_— 
rn 


Pin diameter 
Flange diameter 


Zr-xommo0Be@>s 


Chamfer 


Package weight is approximately 7.2 grams 


Table 13.2 84 pin grid array package junction to ambient thermal resistance 


|SYMBOL | PARAMETER |S MIN’ | NOM | MAX | UNITS | NOTE 
|9JA | At 400 linear ft/min transverse airflow |_| | 85 | CCW 
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14 Ordering 


This section indicates the designation of speed and package selections for the various devices. Speed of 
Clockin is 5 MHz for all parts. Transputer processor cycle time is nominal; it can be calculated more exactly 
using the phase lock loop factor PLLx, as detailed in the external memory section. 


For availability contact local INMOS sales office or authorised distributor. 


Table 14.1 IMS T800 ordering details 


Tapion [octipe| esti [rux[| —rasge 
designation clock speed | cycle time | PLLx Package 

IMS T800-G17S 17.5 MHz Ceramic Pin Grid 
IMS T800-G20S 20.0 MHz Ceramic Pin Grid 
IMS T800-G25S 25.0 MHz Ceramic Pin Grid 


IMS T800-G30S 30.0 MHz Ceramic Pin Grid 


IMS T800-G17M 
IMS T800-G20M 


17.5 MHz 
20.0 MHz 


Ceramic Pin Grid MIL Spec 
Ceramic Pin Grid MIL Spec 


The timing parameters in this datasheet are based on full characterisation of the 17 MHz and 20 MHz 
parts. Data for higher speeds is based on tests on a limited number of samples and may change when full 
characterisation is completed. 
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